arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Ziniu Li"" — arXiv2 Search
Showing 1–8 of 8 results
/ Date
/ Name
Mar 24, 2026
Off-Policy Value-Based Reinforcement Learning for Large Language Models
Jan 9, 2026
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Oct 31, 2025
ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling
Sep 30, 2025
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
May 16, 2025
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Aug 29, 2024
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Jun 24, 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Feb 26, 2024
Why Transformers Need Adam: A Hessian Perspective