Showing 1–17 of 17 results
/ Date/ Name
Apr 12, 2024The Illusion of State in State-Space ModelsOct 28, 2019Certain Hyperbolic Regular Polygonal Tiles are IsoperimetricNov 8, 2023How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument StructureNov 11, 2019Optimal monohedral tilings of hyperbolic surfacesAug 28, 2019The Optimal Double Bubble for Density $r^p$Sep 24, 2021Transformers Generalize LinearlyFeb 8, 2022Do Language Models Learn Position-Role Mappings?Dec 20, 2022(QA)$^2$: Question Answering with Questionable AssumptionsSep 6, 2024How Does Code Pretraining Affect Language Model Task Performance?Jun 5, 2025RELIC: Evaluating Complex Reasoning via the Recognition of Languages In-ContextOct 30, 2023The Impact of Depth on Compositional Generalization in Transformer Language ModelsNov 2, 2020Sequence-to-Sequence Networks Learn the Meaning of Reflexive AnaphoraFeb 26, 2025Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesApr 8, 2026Evaluating In-Context Translation with Synchronous Context-Free Grammar TransductionNov 20, 2023GPQA: A Graduate-Level Google-Proof Q&A BenchmarkNov 13, 2023In-context Learning Generalizes, But Not Always Robustly: The Case of SyntaxNov 15, 2023Debate Helps Supervise Unreliable Experts