arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Paul Burstein"" — arXiv2 Search
Showing 1–5 of 5 results
/ Date
/ Name
Dec 9, 2025
Luxical: High-Speed Lexical-Dense Text Embeddings
Jan 5, 2026
DatBench: Discriminative, Faithful, and Efficient VLM Evaluations
Mar 17, 2026
The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data
Feb 16, 2026
ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset
Aug 14, 2025
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining