arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Haakon Mongstad"" — arXiv2 Search
Showing 1–5 of 5 results
/ Date
/ Name
Feb 16, 2026
ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset
Dec 9, 2025
Luxical: High-Speed Lexical-Dense Text Embeddings
Aug 14, 2025
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
Jan 5, 2026
DatBench: Discriminative, Faithful, and Efficient VLM Evaluations
Mar 17, 2026
The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data