"au:"William Fedus"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"William Fedus"" — arXiv2 Search

Showing 1–20 of 25 results

/ Date/ Name

Jan 11, 2021Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Feb 17, 2022ST-MoE: Designing Stable and Transferable Sparse Expert Models Jan 23, 2018MaskGAN: Better Text Generation via Filling in the______Sep 4, 2022A Review of Sparse Expert Models in Deep Learning Apr 2, 2018Recall Traces: Backtracking Models for Efficient Reinforcement Learning Jul 13, 2020Revisiting Fundamentals of Experience Replay Feb 19, 2019Hyperbolic Discounting and Learning over Multiple Horizons Feb 26, 2018Disentangling the independently controllable factors of variation by interacting with the world Feb 23, 2021Do Transformer Modifications Transfer Across Implementations and Applications?Mar 13, 2021Revisiting ResNets: Improved Training and Scaling Strategies Oct 20, 2022Scaling Instruction-Finetuned Language Models Nov 28, 2019Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction Feb 28, 2020On Catastrophic Interference in Atari 2600 Games Nov 6, 2018Language GANs Falling Short Aug 6, 2019Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment Sep 22, 2021On Bonus-Based Exploration Methods in the Arcade Learning Environment May 24, 2023Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models Apr 16, 2025BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents Sep 27, 2018Deep Graph Infomax Sep 22, 2021Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers