"au:"Songyang Zhang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Songyang Zhang"" — arXiv2 Search

Showing 1–8 of 8 results

/ Date/ Name

Nov 18, 2025ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning Aug 25, 2025InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Jan 24, 2025Humanity's Last Exam Oct 16, 2024ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs May 20, 2024MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark Mar 26, 2024InternLM2 Technical Report Dec 21, 2023T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step Oct 20, 2023BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues