Showing 1–19 of 19 results
/ Date/ Name
Jan 17, 2025Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language ModelsApr 24, 2023Renate: A Library for Real-World Continual LearningMay 29, 2019Limitations of the Empirical Fisher Approximation for Natural Gradient DescentMar 13, 2019DeepOBS: A Deep Learning Optimizer Benchmark SuiteMar 28, 2017Early Stopping without a Validation SetDec 15, 2016Coupling Adaptive Batch Sizes with Learning RatesJul 14, 2022PASHA: Efficient HPO and NAS with Progressive Resource AllocationFeb 19, 2020The Geometry of Sign Gradient DescentDec 9, 2021Gradient-matching coresets for continual learningMar 28, 2022Gradient-Matching Coresets for Rehearsal-Based Continual LearningMay 22, 2017Dissecting Adam: The Sign, Magnitude and Variance of Stochastic GradientsNov 9, 2020Self-Tuning Stochastic Optimization with Curvature-Aware Gradient FilteringJul 24, 2024u-$μ$P: The Unit-Scaled Maximal Update ParametrizationDec 8, 2023A Negative Result on Gradient Matching for Selective BackpropJun 5, 2024Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You NeedMar 16, 2026A Family of LLMs Liberated from Static VocabulariesNov 29, 2023Continual Learning with Low Rank AdaptationMay 24, 2018Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion SegmentationJan 30, 2026SOMBRERO: Measuring and Steering Boundary Placement in End-to-End Hierarchical Sequence Models