arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Alexandre Muzio"" — arXiv2 Search
Showing 1–8 of 8 results
/ Date
/ Name
Apr 7, 2024
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts
Jun 25, 2021
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
May 28, 2022
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Sep 10, 2021
Improving Multilingual Translation by Representation and Gradient Regularization
Dec 31, 2020
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Sep 1, 2021
Discovering Representation Sprachbund For Multilingual Pre-Training
Sep 22, 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Nov 3, 2021
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task