"au:"Mingze Wang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Mingze Wang"" — arXiv2 Search

Showing 1–20 of 27 results

/ Date/ Name

Jun 21, 2022Incorporating Voice Instructions in Model-Based Reinforcement Learning for Self-Driving Cars Feb 26, 2025The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training Jun 7, 2022Generalization Error Bounds for Deep Neural Networks Trained by SGD Oct 1, 2023A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent May 30, 2025GradPower: Powering Gradients for Faster Language Model Pre-Training Jun 5, 2022Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks Nov 18, 2024CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset Nov 24, 2023Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling May 30, 2025On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks Oct 15, 2024How Transformers Get Rich: Approximation and Dynamics Analysis May 21, 2023Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks May 31, 2024Improving Generalization and Convergence by Enhancing Implicit Regularization Sep 7, 2024Leveraging LLMs for Influence Path Planning in Proactive Recommendation Feb 1, 2024Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling Jul 6, 2022The alignment property of SGD noise and how it helps select flat minima: A stability analysis Mar 11, 2026On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD Feb 15, 2026Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws Jul 1, 2023Q-YOLO: Efficient Inference for Real-time Object Detection Oct 14, 2024Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Feb 26, 2026Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement