arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Haodong Duan"" — arXiv2 Search
Showing 1–3 of 3 results
/ Date
/ Name
Nov 18, 2025
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning
May 20, 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Oct 20, 2023
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues