"au:"Jingfeng Wu"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Jingfeng Wu"" — arXiv2 Search

Showing 1–20 of 42 results

/ Date/ Name

Oct 12, 2021Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression Aug 10, 2021The Benefits of Implicit Regularization from SGD in Least Squares Problems Nov 23, 2023Risk Bounds of Accelerated SGD for Overparameterized Linear Regression Oct 12, 2023How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?Mar 1, 2018The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects Oct 16, 2024Context-Scaling versus Task-Scaling in In-Context Learning Apr 5, 2025Memory-Statistics Tradeoff in Continual Learning with Structural Regularization Oct 29, 2024How Does Critical Batch Size Scale in Pre-training?Mar 3, 2023Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron Feb 22, 2024In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization Feb 18, 2025Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression Mar 17, 2023Fixed Design Analysis of Regularization-Based Continual Learning Apr 17, 2021Lifelong Learning with Sketched Structural Regularization Aug 15, 2020Obtaining Adjustable Regularization for Free via Iterate Averaging Mar 7, 2022Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime Nov 4, 2020Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate Jun 18, 2019On the Noisy Gradient Descent that Generalizes as SGD Mar 23, 2021Benign Overfitting of Constant-Stepsize SGD for Linear Regression Jun 10, 2025Improved Scaling Laws in Linear Regression via Data Reuse Feb 22, 2025Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks