Showing 1–13 of 13 results
/ Date/ Name
Jan 17, 2026Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line InterfacesFeb 19, 2025MMTEB: Massive Multilingual Text Embedding BenchmarkJan 24, 2025Humanity's Last ExamJul 20, 2024Consent in Crisis: The Rapid Decline of the AI Data CommonsJun 14, 2024SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian LanguagesFeb 29, 2024StarCoder 2 and The Stack v2: The Next GenerationFeb 12, 2024Aya Model: An Instruction Finetuned Open-Access Multilingual Language ModelNov 3, 2023FinGPT: Large Generative Models for a Small LanguageOct 25, 2023The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AIAug 14, 2023OctoPack: Instruction Tuning Code Large Language ModelsMay 25, 2023Scaling Data-Constrained Language ModelsNov 9, 2022BLOOM: A 176B-Parameter Open-Access Multilingual Language ModelOct 13, 2022MTEB: Massive Text Embedding Benchmark