Showing 1–19 of 19 results
/ Date/ Name
Dec 12, 2023A Perspective of Q-value Estimation on Offline-to-Online Reinforcement LearningOct 18, 2023MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative LearningJul 29, 2021Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object DetectionNov 29, 2022ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-DependencyOct 9, 2023Towards Fair and Comprehensive Comparisons for Image-Based 3D Object DetectionMar 31, 2025Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base ModelJan 9, 2026PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated ReasoningMar 30, 2021Delving into Localization Errors for Monocular 3D Object DetectionDec 18, 2023Explore 3D Dance Generation via Reward Model from Automatically-Ranked DemonstrationsJul 7, 2025Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual ReasoningJan 14, 2026STEP3-VL-10B Technical ReportFeb 12, 2026PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and EngineeringJul 25, 2025Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective DecodingJul 24, 2023Theoretically Guaranteed Policy Improvement Distilled from Model-Based PlanningAug 15, 2022An Empirical Study of Pseudo-Labeling for Image-based 3D Object DetectionDec 26, 2024Multi-matrix Factorization AttentionFeb 6, 2026R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-JudgingNov 28, 2025Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn InteractionFeb 11, 2026Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters