arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Mou Sun"" — arXiv2 Search
Showing 1–8 of 8 results
/ Date
/ Name
Feb 22, 2026
Grouter: Decoupling Routing from Representation for Accelerated MoE Training
Nov 4, 2025
FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error
Mar 3, 2026
Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs
Feb 23, 2021
Decomposition Methods for Global Solutions of Mixed-Integer Linear Programs
Dec 21, 2023
MindOpt Adapter for CPLEX Benchmarking Performance Analysis
Oct 18, 2025
MeCeFO: Enhancing LLM Training Robustness via Fault-Tolerant Optimization
Feb 26, 2026
Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement
Feb 15, 2026
Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization