Showing 41–60 of 233 results
/ Date/ Name
Oct 9, 2024LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple ConstraintsJun 19, 2024Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented GenerationJun 19, 2024VDebugger: Harnessing Execution Feedback for Debugging Visual ProgramsNov 16, 2023AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples GenerationDec 3, 2022Towards Robust NLG Bias Evaluation with Syntactically-diverse PromptsAug 19, 2024ARMADA: Attribute-Based Multimodal Data AugmentationSep 5, 2024Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive DecodingMar 5, 2025Structured Outputs Enable General-Purpose LLMs to be Medical ExpertsOct 26, 2024Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual SummarizationJun 3, 2024Re-ReST: Reflection-Reinforced Self-Training for Language AgentsFeb 24, 2025Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal GesturesOct 27, 2024Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?Nov 16, 2023MacGyver: Are Large Language Models Creative Problem Solvers?Oct 26, 2025MMPersuade: A Dataset and Evaluation Framework for Multimodal PersuasionJun 2, 2025AI Debate Aids Assessment of Controversial ClaimsAug 22, 2025FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis PipelineOct 26, 2024Vulnerability of LLMs to Vertically Aligned Text ManipulationsSep 29, 2025TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language ModelsFeb 25, 2025MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning AttacksMar 22, 2024Argument-Aware Approach To Event Linking