"au:"Lukas Balles"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Lukas Balles"" — arXiv2 Search

Showing 1–19 of 19 results

/ Date/ Name

Jan 17, 2025Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models Apr 24, 2023Renate: A Library for Real-World Continual Learning May 29, 2019Limitations of the Empirical Fisher Approximation for Natural Gradient Descent Mar 13, 2019DeepOBS: A Deep Learning Optimizer Benchmark Suite Mar 28, 2017Early Stopping without a Validation Set Dec 15, 2016Coupling Adaptive Batch Sizes with Learning Rates Jul 14, 2022PASHA: Efficient HPO and NAS with Progressive Resource Allocation Feb 19, 2020The Geometry of Sign Gradient Descent Dec 9, 2021Gradient-matching coresets for continual learning Mar 28, 2022Gradient-Matching Coresets for Rehearsal-Based Continual Learning May 22, 2017Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients Nov 9, 2020Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering Jul 24, 2024u-$μ$P: The Unit-Scaled Maximal Update Parametrization Dec 8, 2023A Negative Result on Gradient Matching for Selective Backprop Jun 5, 2024Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need Mar 16, 2026A Family of LLMs Liberated from Static Vocabularies Nov 29, 2023Continual Learning with Low Rank Adaptation May 24, 2018Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation Jan 30, 2026SOMBRERO: Measuring and Steering Boundary Placement in End-to-End Hierarchical Sequence Models