arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Jiaheng Liu"" — arXiv2 Search
Showing 1–8 of 8 results
/ Date
/ Name
Apr 23, 2026
When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors
Jan 9, 2026
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Sep 4, 2025
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
May 29, 2025
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
May 20, 2025
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Feb 20, 2025
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Jun 21, 2024
GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Jun 11, 2024
McEval: Massively Multilingual Code Evaluation