Showing 1–20 of 27 results
/ Date/ Name
Jun 21, 2022Incorporating Voice Instructions in Model-Based Reinforcement Learning for Self-Driving CarsFeb 26, 2025The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-TrainingJun 7, 2022Generalization Error Bounds for Deep Neural Networks Trained by SGDOct 1, 2023A Theoretical Analysis of Noise Geometry in Stochastic Gradient DescentMay 30, 2025GradPower: Powering Gradients for Faster Language Model Pre-TrainingJun 5, 2022Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural NetworksNov 18, 2024CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational DatasetNov 24, 2023Achieving Margin Maximization Exponentially Fast via Progressive Norm RescalingMay 30, 2025On the Expressive Power of Mixture-of-Experts for Structured Complex TasksOct 15, 2024How Transformers Get Rich: Approximation and Dynamics AnalysisMay 21, 2023Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU NetworksMay 31, 2024Improving Generalization and Convergence by Enhancing Implicit RegularizationSep 7, 2024Leveraging LLMs for Influence Path Planning in Proactive RecommendationFeb 1, 2024Understanding the Expressive Power and Mechanisms of Transformer for Sequence ModelingJul 6, 2022The alignment property of SGD noise and how it helps select flat minima: A stability analysisMar 11, 2026On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGDFeb 15, 2026Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling LawsJul 1, 2023Q-YOLO: Efficient Inference for Real-time Object DetectionOct 14, 2024Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in TrainingFeb 26, 2026Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement