arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Dongbai Li"" — arXiv2 Search
Showing 1–5 of 5 results
/ Date
/ Name
Oct 31, 2025
ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction
Sep 29, 2025
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Apr 14, 2025
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability
Mar 20, 2025
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Feb 11, 2025
Sample Weight Averaging for Stable Prediction