Showing 1–20 of 21 results
/ Date/ Name
Nov 20, 2023FinanceBench: A New Benchmark for Financial Question AnsweringSep 6, 2021Learning Neural Causal Models with Active InterventionsJan 23, 2025Not Every AI Problem is a Data Problem: We Should Be Intentional About Data ScalingJun 14, 2021Variational Causal Networks: Approximate Bayesian Inference over Causal StructuresJul 26, 2023Evaluating the Moral Beliefs Encoded in LLMsJun 9, 2022On the Generalization and Adaption Performance of Causal ModelsJun 5, 2025MesaNet: Sequence Modeling by Locally Optimal Test-Time TrainingJun 24, 2024Inducing Group Fairness in Prompt-Based Language Model DecisionsSep 9, 2025No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language ModelsOct 24, 2024Multi-agent cooperation through learning-aware policy gradientsJan 15, 2026Reasoning Models Generate Societies of ThoughtNov 14, 2023SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language ModelsNov 24, 2022Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal DiscoveryDec 6, 2025Uncovering Competency Gaps in Large Language Models and Their BenchmarksDec 23, 2025Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learningSep 11, 2023Uncovering mesa-optimization algorithms in TransformersNov 7, 2022Federated Causal Discovery From InterventionsDec 29, 2020Improved Segmentation and Detection Sensitivity of Diffusion-Weighted Brain Infarct Lesions with Synthetically Enhanced Deep LearningApr 2, 2025On the Role of Feedback in Test-Time Scaling of Agentic AI WorkflowsApr 18, 2024Introducing v0.5 of the AI Safety Benchmark from MLCommons