Showing 1–20 of 36 results
/ Date/ Name
Oct 9, 2024Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference LearningFeb 2, 2023Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value FunctionOct 24, 2020Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement LearningJul 25, 2022Live in the Moment: Learning Dynamics Model Adapted to Evolving PolicyJan 19, 2024Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image SequencesDec 4, 2024Scaling Inference-Time Search with Vision Value Model for Improved Visual ComprehensionJun 11, 2025ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsMar 10, 2026VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTsOct 11, 2023COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RLAug 31, 2025LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy ModelApr 10, 2025SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-ImprovementJun 11, 2024World Models with Hints of Large Language Models for Goal AchievingJun 8, 2025What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient DecodingNov 26, 2025Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-FollowingMar 9, 2026Agentic Critical TrainingJan 1, 2022Transfer RL across Observation Feature Spaces via Model-Based RegularizationFeb 9, 2024Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive LossMay 24, 2024Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-ImprovementDec 14, 2025Lemon: A Unified and Scalable 3D Multimodal Model for Universal Spatial UnderstandingJun 24, 2025Listwise Direct Preference Optimization with Multi-Dimensional Preference Mixing