Showing 1–20 of 41 results
/ Date/ Name
Apr 4, 2020Benchmarking Machine Reading Comprehension: A Psychological PerspectiveSep 16, 2022Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible ScenariosMar 12, 2022What Makes Reading Comprehension Questions Difficult?Aug 28, 2018What Makes Reading Comprehension Questions Easier?Nov 21, 2019Assessing the Benchmarking Capacity of Machine Reading Comprehension DatasetsMay 24, 2023On Degrees of Freedom in Defining and Testing Natural Language UnderstandingNov 29, 2022Penalizing Confident Predictions on Largely Perturbed Inputs Does Not Improve Out-of-Distribution Generalization in Question AnsweringOct 28, 2022Debiasing Masks: A New Framework for Shortcut Mitigation in NLUOct 8, 2024Can Language Models Induce Grammatical Knowledge from Indirect Evidence?Sep 21, 2025TactfulToM: Do LLMs Have the Theory of Mind Ability to Understand White Lies?Mar 7, 2026Seeing the Reasoning: How LLM Rationales Influence User Trust and Decision-Making in Factual Verification TasksFeb 25, 2026CxMP: A Linguistic Minimal-Pair Benchmark for Evaluating Constructional Understanding in Language ModelsApr 22, 2026Aligning Human-AI-Interaction Trust for Mental Health Support: Survey and Position for Multi-StakeholdersDec 14, 2022Cross-Modal Similarity-Based Curriculum Learning for Image CaptioningNov 30, 2023Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading ComprehensionFeb 12, 2023Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question AnsweringNov 29, 2022Which Shortcut Solution Do Question Answering Models Prefer to Learn?Nov 2, 2020Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning StepsJun 1, 2021What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?Sep 23, 2021Can Question Generation Debias Question Answering Models? A Case Study on Question-Context Lexical Overlap