Showing 1–20 of 21 results
/ Date/ Name
Oct 17, 2023RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior MechanismsAug 6, 2025CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM InferenceOct 13, 2024RMB: Comprehensively Benchmarking Reward Models in LLM AlignmentFeb 4, 2026Steering LLMs via Scalable Interactive OversightMar 21, 2022Global Matching with Overlapping Attention for Optical Flow EstimationFeb 2, 2024StepCoder: Improve Code Generation with Reinforcement Learning from Compiler FeedbackOct 21, 2025BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive ClippingAug 5, 2025VRPO: Rethinking Value Modeling for Robust RL Training under Noisy SupervisionApr 15, 2026MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement LearningDec 15, 2023LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style PluginJan 26, 2021Semi-synthesis: A fast way to produce effective datasets for stereo matchingMay 1, 2024MetaRM: Shifted Distributions Alignment via Meta-LearningJun 26, 2024SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity GuidanceSep 14, 2023The Rise and Potential of Large Language Model Based Agents: A SurveyJan 11, 2024Secrets of RLHF in Large Language Models Part II: Reward ModelingJun 17, 2024Aligning Large Language Models from Self-Reference AI Feedback with one General PrincipleJul 7, 2025Pre-Trained Policy Discriminators are General Reward ModelsJan 19, 2026FRoM-W1: Towards General Humanoid Whole-Body Control with Language InstructionsDec 4, 2025Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment ConstructionJun 30, 2025Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective