Showing 1–20 of 27 results
/ Date/ Name
Dec 1, 2017Deep Learning Scaling is Predictable, EmpiricallyMar 25, 2020Pipelined Backpropagation at Scale: Training Large Models without BatchesSep 3, 2019Beyond Human-Level Accuracy: Computational Challenges in Deep LearningApr 19, 2021Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor SegmentationApr 6, 2023Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale ClusterJun 28, 2022RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid NetworkMar 15, 2017Convolutional Recurrent Neural Networks for Small-Footprint Keyword SpottingOct 18, 2023Position Interpolation Improves ALiBi ExtrapolationNov 1, 2024Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in TransformersMay 2, 2025Don't be lazy: CompleteP enables compute-efficient deep transformersMay 19, 2025Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-trainingSep 20, 2023BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter ModelSep 19, 2023SlimPajama-DC: Understanding Data Combinations for LLM TrainingMay 24, 2024Sparse maximal update parameterization: A holistic approach to sparse training dynamicsNov 6, 2024Crystal: Illuminating LLM Abilities on Language and CodeDec 5, 2025K2-V2: A 360-Open, Reasoning-Enhanced LLMFeb 21, 2025Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMsAug 30, 2023Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language ModelsOct 7, 2019Compositional Generalization for Primitive SubstitutionsMar 1, 2024MediSwift: Efficient Sparse Pre-trained Biomedical Language Models