arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Dingqian Hong"" — arXiv2 Search
Showing 1–2 of 2 results
/ Date
/ Name
Apr 28, 2025
GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Aug 2, 2025
RSPO: Risk-Seeking Policy Optimization for Pass@k and Max@k Metrics in Large Language Models