Showing 41–60 of 65 results
/ Date/ Name
Jun 4, 2025OpenThoughts: Data Recipes for Reasoning ModelsApr 29, 2025ReasonIR: Training Retrievers for Reasoning TasksFeb 9, 2024Aya Dataset: An Open-Access Collection for Multilingual Instruction TuningDec 19, 2022BLOOM+1: Adding Language Support to BLOOM for Zero-Shot PromptingFeb 26, 2024A Survey on Data Selection for Language ModelsMar 13, 2024Language models scale reliably with over-training and on downstream tasksApr 6, 2025Retro-Search: Exploring Untaken Paths for Deeper and Efficient ReasoningSep 25, 2024Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language ModelsJul 9, 2025FlexOlmo: Open Language Models for Flexible Data UseApr 8, 2024Eagle and Finch: RWKV with Matrix-Valued States and Dynamic RecurrenceSep 14, 2023C-Pack: Packed Resources For General Chinese EmbeddingsJul 18, 2024Scaling Laws with Vocabulary: Larger Models Deserve Larger VocabulariesApr 14, 2025MIEB: Massive Image Embedding BenchmarkJul 16, 2024BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive RetrievalDec 19, 2024Bridging the Data Provenance Gap Across Text, Speech and VideoMay 8, 2025Crosslingual Reasoning through Test-Time ScalingFeb 19, 2025MMTEB: Massive Multilingual Text Embedding BenchmarkMar 30, 2024Aurora-M: Open Source Continual Pre-training for Multilingual Language and CodeNov 3, 2023FinGPT: Large Generative Models for a Small LanguageJun 2, 2025Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability