arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Guangju Wang"" — arXiv2 Search
Showing 1–5 of 5 results
/ Date
/ Name
Jan 9, 2019
Consensus Mechanism Design based on Structured Directed Acyclic Graphs
Oct 19, 2024
On Designing Effective RL Reward at Training Time for LLM Reasoning
Apr 16, 2024
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Jun 20, 2024
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
Jun 29, 2023
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores