"au:"Li Shen"" — arXiv2 SearchShowing 1–9 of 9 results
/ Date/ Name
Apr 20, 2026Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise SamplingFeb 16, 2025AdaGC: Improving Training Stability for Large Language Model PretrainingNov 25, 2024Exploring the Generalization Capabilities of AID-based Bi-level OptimizationOct 23, 2023Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient LipschitzMay 28, 2022Efficient-Adam: Communication-Efficient Distributed AdamApr 16, 2022Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and UnderstandingJan 14, 2021Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch AccelerationApr 29, 2020Quantized Adam with Error FeedbackAug 10, 2018A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration