"au:"Jiaheng Liu"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Jiaheng Liu"" — arXiv2 Search

Showing 1–8 of 8 results

/ Date/ Name

Apr 23, 2026When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors Jan 9, 2026The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Sep 4, 2025Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?May 29, 2025ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding May 20, 2025KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation Feb 20, 2025SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Jun 21, 2024GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models Jun 11, 2024McEval: Massively Multilingual Code Evaluation