arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Amar Budhiraja"" — arXiv2 Search
Showing 1–6 of 6 results
/ Date
/ Name
Feb 12, 2026
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
Feb 6, 2026
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents
Nov 19, 2025
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Nov 17, 2025
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
Sep 21, 2025
ARE: Scaling Up Agent Environments and Evaluations
Feb 20, 2025
MLGym: A New Framework and Benchmark for Advancing AI Research Agents