Showing 1–20 of 21 results
/ Date/ Name
Jul 20, 2024Consent in Crisis: The Rapid Decline of the AI Data CommonsDec 19, 2022Multi hash embeddings in spaCyAug 5, 2025FilBench: Can LLMs Understand and Generate Filipino?May 26, 2025The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation ProjectOct 20, 2024M-RewardBench: Evaluating Reward Models in Multilingual SettingsNov 22, 2024Tulu 3: Pushing Frontiers in Open Language Model Post-TrainingApr 13, 2026Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data GenerationNov 13, 2023calamanCy: A Tagalog Natural Language Processing ToolkitApr 23, 2026Multilinguality at the Edge: Developing Language Models for the Global SouthOct 24, 2024Hybrid Preferences: Learning to Route Instances for Human vs. AI FeedbackDec 31, 20242 OLMo 2 FuriousOct 12, 2019Geomancer: An Open-Source Framework for Geospatial Feature EngineeringNov 13, 2023Developing a Named Entity Recognition Dataset for TagalogDec 19, 2024Bridging the Data Provenance Gap Across Text, Speech and VideoFeb 19, 2025MMTEB: Massive Multilingual Text Embedding BenchmarkNov 15, 2023Universal NER: A Gold-Standard Multilingual Named Entity Recognition BenchmarkMar 20, 2024RewardBench: Evaluating Reward Models for Language ModelingMar 10, 2025Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast AsiaDec 15, 2025Olmo 3May 19, 2025R3: Robust Rubric-Agnostic Reward Models