arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Tian Xu"" — arXiv2 Search
Showing 1–2 of 2 results
/ Date
/ Name
Mar 24, 2026
Off-Policy Value-Based Reinforcement Learning for Large Language Models
Aug 29, 2024
Preserving Diversity in Supervised Fine-Tuning of Large Language Models