arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Juntao Dai"" — arXiv2 Search
Showing 1–1 of 1 results
/ Date
/ Name
Mar 23, 2025
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization