Showing 1–12 of 12 results
/ Date/ Name
Feb 23, 2026A Very Big Video Reasoning SuiteMay 28, 2025Jailbreak Distillation: Renewable Safety BenchmarkingFeb 28, 2024RORA: Robust Free-Text Rationale EvaluationMay 22, 2023"According to ...": Prompting Language Models Improves Quoting from Pre-Training DataOct 16, 2021Hey AI, Can You Solve Complex Tasks by Talking to Agents?Jun 2, 2021Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?Jan 6, 2021Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning StrategiesOct 6, 2020UnQovering Stereotyping Biases via Underspecified QuestionsSep 1, 2020Text Modular Networks: Learning to Decompose Tasks in the Language of Existing ModelsMay 2, 2020UnifiedQA: Crossing Format Boundaries With a Single QA SystemSep 4, 2019From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo ProjectApr 20, 2016Question Answering via Integer Programming over Semi-Structured Knowledge