Showing 21–40 of 41 results
/ Date/ Name
Jun 19, 2024MoreHopQA: More Than Multi-hop ReasoningJul 22, 2025Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language ModelsFeb 12, 2026Which Feedback Works for Whom? Differential Effects of LLM-Generated Feedback Elements Across Learner ProfilesApr 29, 2026A Dual-Task Paradigm to Investigate Sentence Comprehension Strategies in Language ModelsDec 14, 2023PROPRES: Investigating the Projectivity of Presupposition with Various Triggers and EnvironmentsOct 26, 2022Look to the Right: Mitigating Relative Position Bias in Extractive Question AnsweringOct 11, 2022How Well Do Multi-hop Reading Comprehension Models Understand Date Information?Jun 4, 2023Probing Physical Reasoning with Counter-Commonsense ContextJun 6, 2021Embracing Ambiguity: Shifting the Training Target of NLI ModelsOct 7, 2024Rationale-Aware Answer Verification by Pairwise Self-EvaluationSep 22, 2025Specification-Aware Machine Translation and Evaluation for Purpose AlignmentJun 4, 2025Measuring Human Involvement in AI-Generated Text: A Case Study on Academic WritingApr 2, 2026Eyes Can't Always Tell: Fusing Eye Tracking and User Priors for User Modeling under AI Advice ConditionsApr 7, 2026Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-JudgeJan 27, 2025Automatic Feedback Generation for Short Answer Questions using Answer Diagnostic GraphsApr 7, 2020Improving the Robustness of QA Models to Challenge Sets with Variational Question-Answer Pair GenerationSep 5, 2022A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading ComprehensionJun 6, 2024What Makes Language Models Good-enough?Jul 4, 2024LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMsAug 21, 2025Are Checklists Really Useful for Automatic Evaluation of Generative Tasks?