arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Keming Lu"" — arXiv2 Search
Showing 1–6 of 6 results
/ Date
/ Name
May 15, 2025
WorldPM: Scaling Human Preference Modeling
Sep 18, 2024
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Aug 20, 2024
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
Jun 19, 2024
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
Jun 3, 2024
Towards Scalable Automated Alignment of LLMs: A Survey
May 28, 2024
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment