"au:"Gavia Gray"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Gavia Gray"" — arXiv2 Search

Showing 1–6 of 6 results

/ Date/ Name

Nov 1, 2024Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers Feb 21, 2025Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs May 19, 2025Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training Jun 10, 2019BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget Jun 3, 2019Separable Layers Enable Structured Efficient Linear Substitutions Nov 7, 2017Moonshine: Distilling with Cheap Convolutions