Showing 1–20 of 23 results
/ Date/ Name
Apr 29, 2022Training Language Models with Language FeedbackFeb 28, 2023EvoPrompting: Language Models for Code-Level Neural Architecture SearchOct 29, 2024Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization TasksMar 28, 2023Improving Code Generation by Training with Natural Language FeedbackMay 23, 2022SQuALITY: Building a Long-Document Summarization Dataset the Hard WayMay 23, 2023Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMsMar 28, 2023Training Language Models with Language Feedback at ScaleAug 26, 2022What Do NLP Researchers Believe? Results of the NLP Community MetasurveyFeb 16, 2023Pretraining Language Models with Human PreferencesMay 21, 2019Generating Logical Forms from Graph Representations of Text and EntitiesOct 15, 2021BBQ: A Hand-Built Bias Benchmark for Question AnsweringMay 2, 2022Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency DetectionApr 11, 2022Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension QuestionsMay 29, 2024Preference Learning Algorithms Do Not Learn Preference RankingsDec 8, 2023Playing Large Games with Oracles and AI DebateAug 18, 2023Latent State Models of Training DynamicsSep 13, 2023Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMsNov 16, 2021Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be FairJun 26, 2025Bridging Offline and Online Reinforcement Learning for LLMsNov 17, 2025Generalist Foundation Models Are Not Clinical Enough for Hospital Operations