arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Le Sun"" — arXiv2 Search
Showing 1–6 of 6 results
/ Date
/ Name
Oct 24, 2025
When Models Outthink Their Safety: Unveiling and Mitigating Self-Jailbreak in Large Reasoning Models
Aug 21, 2025
A Survey on Large Language Model Benchmarks
Jul 20, 2025
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback
Oct 28, 2024
Transferable Post-training via Inverse Value Learning
Jun 3, 2024
Towards Scalable Automated Alignment of LLMs: A Survey
Feb 27, 2024
SoFA: Shielded On-the-fly Alignment via Priority Rule Following