arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Rizhen Hu"" — arXiv2 Search
Showing 1–4 of 4 results
/ Date
/ Name
Feb 15, 2026
Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization
Feb 22, 2026
Grouter: Decoupling Routing from Representation for Accelerated MoE Training
Oct 18, 2025
MeCeFO: Enhancing LLM Training Robustness via Fault-Tolerant Optimization
Feb 26, 2026
Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement