Showing 1–20 of 25 results
/ Date/ Name
Jan 11, 2021Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityFeb 17, 2022ST-MoE: Designing Stable and Transferable Sparse Expert ModelsJan 23, 2018MaskGAN: Better Text Generation via Filling in the______Sep 4, 2022A Review of Sparse Expert Models in Deep LearningApr 2, 2018Recall Traces: Backtracking Models for Efficient Reinforcement LearningJul 13, 2020Revisiting Fundamentals of Experience ReplayFeb 19, 2019Hyperbolic Discounting and Learning over Multiple HorizonsFeb 26, 2018Disentangling the independently controllable factors of variation by interacting with the worldFeb 23, 2021Do Transformer Modifications Transfer Across Implementations and Applications?Mar 13, 2021Revisiting ResNets: Improved Training and Scaling StrategiesOct 20, 2022Scaling Instruction-Finetuned Language ModelsNov 28, 2019Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive FictionFeb 28, 2020On Catastrophic Interference in Atari 2600 GamesNov 6, 2018Language GANs Falling ShortAug 6, 2019Benchmarking Bonus-Based Exploration Methods on the Arcade Learning EnvironmentSep 22, 2021On Bonus-Based Exploration Methods in the Arcade Learning EnvironmentMay 24, 2023Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language ModelsApr 16, 2025BrowseComp: A Simple Yet Challenging Benchmark for Browsing AgentsSep 27, 2018Deep Graph InfomaxSep 22, 2021Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers