Showing 1–13 of 13 results
/ Date/ Name
Feb 26, 2024On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular FunctionsDec 11, 2023Modyn: Data-Centric Machine Learning Pipeline OrchestrationJan 25, 2022What's Wrong with Deep Learning in Tree Search for Combinatorial OptimizationOct 15, 2020Maps for Learning Indexable ClassesAug 24, 2020A Strategic Routing Framework and Algorithms for Computing Alternative PathsFeb 27, 2025Mixtera: A Data Plane for Foundation Model TrainingMay 12, 202620/20 Vision Language Models: A Prescription for Better VLMs through Data Curation AloneOct 15, 2020Learning Languages with Decidable HypothesesFeb 16, 2026ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token DatasetSep 17, 2025Apertus: Democratizing Open and Compliant LLMs for Global Language EnvironmentsAug 4, 2022Efficiently Computing Directed Minimum Spanning TreesOct 15, 2021Law Smells: Defining and Detecting Problematic Patterns in Legal DraftingMar 17, 2026The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data