Showing 1–20 of 34 results
/ Date/ Name
Aug 3, 2021Large-Scale Differentially Private BERTNov 30, 2018TF-Ranking: Scalable TensorFlow Library for Learning-to-RankFeb 26, 2020Disentangling Adaptive Gradient Methods from Learning RatesJan 30, 2019Memory-Efficient Adaptive OptimizationFeb 20, 2020Scalable Second Order Optimization for Deep LearningSep 12, 2022On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation ModelsJun 8, 2019Robust Bi-Tempered Logistic Loss Based on Bregman DivergencesApr 9, 2018Large scale distributed neural network training through online distillationJun 9, 2021Knowledge distillation: A good teacher is patient and consistentJun 11, 2021LocoProp: Enhancing BackProp via Local Loss OptimizationFeb 12, 2021A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch SizesOct 4, 2023Heterogeneous Federated Learning Using Knowledge CodistillationNov 16, 2023A Computationally Efficient Sparsified Online Newton MethodMar 14, 2024Learning from straggler clients in federated learningMar 13, 2024Gemma: Open Models Based on Gemini Research and TechnologyJul 13, 2022N-Grammer: Augmenting Transformers with latent n-gramsJun 12, 2023Benchmarking Neural Network Training AlgorithmsJul 7, 2025Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic CapabilitiesOct 26, 2020Stochastic Optimization with Laggard Data PipelinesFeb 7, 2023Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions