arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Zihan Xu"" — arXiv2 Search
Showing 1–3 of 3 results
/ Date
/ Name
Nov 4, 2025
LTD-Bench: Evaluating Large Language Models by Letting Them Draw
Oct 21, 2025
CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent
Jun 2, 2025
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models