arXiv2
Search
Toggle theme
/ Date
/ Name
Search
/ Date
/ Name
"au:"Megan Kinniment"" — arXiv2 Search
Showing 1–4 of 4 results
/ Date
/ Name
Mar 21, 2025
HCAST: Human-Calibrated Autonomy Software Tasks
Nov 22, 2024
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
Mar 18, 2025
Measuring AI Ability to Complete Long Software Tasks
Dec 18, 2023
Evaluating Language-Model Agents on Realistic Autonomous Tasks