arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Yuzhong Hong"" — arXiv2 Search
Showing 1–5 of 5 results
/ Date
/ Name
Feb 11, 2026
Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity
Dec 18, 2024
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Dec 17, 2024
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Apr 28, 2025
GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Aug 2, 2025
RSPO: Risk-Seeking Policy Optimization for Pass@k and Max@k Metrics in Large Language Models