"au:"Xiyao Wang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Xiyao Wang"" — arXiv2 Search

Showing 1–20 of 36 results

/ Date/ Name

Oct 9, 2024Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Feb 2, 2023Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function Oct 24, 2020Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning Jul 25, 2022Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy Jan 19, 2024Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences Dec 4, 2024Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension Jun 11, 2025ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Mar 10, 2026VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs Oct 11, 2023COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL Aug 31, 2025LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Apr 10, 2025SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Jun 11, 2024World Models with Hints of Large Language Models for Goal Achieving Jun 8, 2025What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding Nov 26, 2025Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following Mar 9, 2026Agentic Critical Training Jan 1, 2022Transfer RL across Observation Feature Spaces via Model-Based Regularization Feb 9, 2024Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss May 24, 2024Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement Dec 14, 2025Lemon: A Unified and Scalable 3D Multimodal Model for Universal Spatial Understanding Jun 24, 2025Listwise Direct Preference Optimization with Multi-Dimensional Preference Mixing