Showing 1–20 of 21 results
/ Date/ Name
Oct 26, 2025MMPersuade: A Dataset and Evaluation Framework for Multimodal PersuasionSep 29, 2025TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language ModelsAug 22, 2025FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis PipelineJun 2, 2025AI Debate Aids Assessment of Controversial ClaimsMar 5, 2025Structured Outputs Enable General-Purpose LLMs to be Medical ExpertsFeb 25, 2025MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning AttacksFeb 24, 2025Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal GesturesOct 27, 2024Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?Oct 26, 2024Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual SummarizationOct 26, 2024Vulnerability of LLMs to Vertically Aligned Text ManipulationsOct 9, 2024LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple ConstraintsSep 5, 2024Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive DecodingAug 19, 2024ARMADA: Attribute-Based Multimodal Data AugmentationJun 19, 2024Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented GenerationJun 19, 2024VDebugger: Harnessing Execution Feedback for Debugging Visual ProgramsJun 3, 2024Re-ReST: Reflection-Reinforced Self-Training for Language AgentsMar 22, 2024Argument-Aware Approach To Event LinkingNov 16, 2023MacGyver: Are Large Language Models Creative Problem Solvers?Nov 16, 2023AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples GenerationJun 20, 2023Open-Domain Text Evaluation via Contrastive Distribution Methods