Showing 1–20 of 40 results
/ Date/ Name
May 24, 2018A Corpus for Multilingual Document Classification in Eight LanguagesDec 26, 2018Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and BeyondMay 25, 2022Bitext Mining Using Distilled Sentence Representations for Low-Resource LanguagesNov 3, 2018Margin-based Parallel Corpus Mining with Multilingual Sentence EmbeddingsMay 24, 2018Filtering and Mining Parallel Data in a Joint Multilingual SpaceDec 11, 2024Large Concept Models: Language Modeling in a Sentence Representation SpaceNov 10, 2019CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEBJul 10, 2019WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from WikipediaApr 13, 2017Learning Joint Multilingual Sentence Representations with Neural Machine TranslationOct 10, 2022Multilingual Representation Distillation with Contrastive LearningOct 16, 2019MLQA: Evaluating Cross-lingual Extractive Question AnsweringMar 9, 2022FlexIT: Towards Flexible Semantic Image TranslationDec 11, 2024LCFO: Long Context and Long Form Output Dataset and BenchmarkingNov 11, 2022Speech-to-Speech Translation For A Real-world Unwritten LanguageJun 22, 2023xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource LanguagesJun 20, 2019Low-Resource Corpus Filtering using Multilingual Sentence EmbeddingsDec 15, 2021Textless Speech-to-Speech Translation on Real DataMar 17, 2026Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and SpeechDec 6, 2021Embedding Arithmetic of Multimodal Queries for Image RetrievalJun 6, 2016Very Deep Convolutional Networks for Text Classification