arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Zhiyu Mei"" — arXiv2 Search
Showing 1–9 of 9 results
/ Date
/ Name
Jun 29, 2023
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
Jun 20, 2024
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
Oct 19, 2024
On Designing Effective RL Reward at Training Time for LLM Reasoning
Jan 31, 2026
AREAL-DTA: Dynamic Tree Attention for Efficient Reinforcement Learning of Large Language Models
Aug 11, 2025
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
May 30, 2025
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Nov 2, 2025
AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs
Apr 16, 2024
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Jun 8, 2025
How Far Are We from Optimal Reasoning Efficiency?