arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Ben West"" — arXiv2 Search
Showing 1–3 of 3 results
/ Date
/ Name
Mar 18, 2025
Measuring AI Ability to Complete Long Software Tasks
Mar 21, 2025
HCAST: Human-Calibrated Autonomy Software Tasks
Nov 22, 2024
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts