Showing 1–18 of 18 results
/ Date/ Name
Nov 15, 2022Contextual Transformer for Offline Meta Reinforcement LearningJun 24, 2023Large Sequence Models for Sequential Decision-Making: A SurveyAug 14, 2023#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language ModelsDec 19, 2023Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization ApproachSep 18, 2024Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-ImprovementSep 11, 2024Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied IntelligenceDec 9, 2024ProcessBench: Identifying Process Errors in Mathematical ReasoningJun 20, 2024LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language FeedbackMay 18, 2025MARGE: Improving Math Reasoning for LLMs with Guided ExplorationJan 13, 2025The Lessons of Developing Process Reward Models in Mathematical ReasoningSep 28, 2023Qwen Technical ReportMay 30, 2022Multi-Agent Reinforcement Learning is a Sequence Modeling ProblemJul 13, 2022Scalable Model-based Policy Optimization for Decentralized Networked SystemsDec 19, 2024Qwen2.5 Technical ReportJul 15, 2024Qwen2 Technical ReportNov 15, 2023Routing to the Expert: Efficient Reward-guided Ensemble of Large Language ModelsMay 28, 2024Online Merging Optimizers for Boosting Rewards and Mitigating Tax in AlignmentMay 15, 2025WorldPM: Scaling Human Preference Modeling