Showing 1–20 of 42 results
/ Date/ Name
Oct 12, 2021Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear RegressionAug 10, 2021The Benefits of Implicit Regularization from SGD in Least Squares ProblemsNov 23, 2023Risk Bounds of Accelerated SGD for Overparameterized Linear RegressionOct 12, 2023How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?Mar 1, 2018The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization EffectsOct 16, 2024Context-Scaling versus Task-Scaling in In-Context LearningApr 5, 2025Memory-Statistics Tradeoff in Continual Learning with Structural RegularizationOct 29, 2024How Does Critical Batch Size Scale in Pre-training?Mar 3, 2023Finite-Sample Analysis of Learning High-Dimensional Single ReLU NeuronFeb 22, 2024In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD InitializationFeb 18, 2025Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic RegressionMar 17, 2023Fixed Design Analysis of Regularization-Based Continual LearningApr 17, 2021Lifelong Learning with Sketched Structural RegularizationAug 15, 2020Obtaining Adjustable Regularization for Free via Iterate AveragingMar 7, 2022Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation RegimeNov 4, 2020Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning RateJun 18, 2019On the Noisy Gradient Descent that Generalizes as SGDMar 23, 2021Benign Overfitting of Constant-Stepsize SGD for Linear RegressionJun 10, 2025Improved Scaling Laws in Linear Regression via Data ReuseFeb 22, 2025Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks