Showing 1–20 of 34 results
/ Date/ Name
May 29, 2023Direct Preference Optimization: Your Language Model is Secretly a Reward ModelJul 2, 2019Dynamics-Aware Unsupervised Discovery of SkillsApr 27, 2020Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement LearningMay 11, 2022A State-Distribution Matching Approach to Non-Episodic Reinforcement LearningMar 2, 2023Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement LearningFeb 19, 2024A Critical Evaluation of AI Feedback for Aligning Large Language ModelsFeb 10, 2026Instruct2Act: From Human Instruction to Actions Sequencing and Execution via Robot Action Network for Robotic ManipulationJul 27, 2021Autonomous Reinforcement Learning via Subgoal CurriculaFeb 10, 2026RoboSubtaskNet: Temporal Sub-task Segmentation for Human-to-Robot Skill Transfer in Real-World EnvironmentsMay 3, 2018TrueChain: Highly Performant Decentralized Public LedgerDec 17, 2021Autonomous Reinforcement Learning: Formalism and BenchmarkingMar 24, 2021Discriminator Augmented Model-Based Reinforcement LearningMar 19, 2024DROID: A Large-Scale In-The-Wild Robot Manipulation DatasetJan 29, 2024SERL: A Software Suite for Sample-Efficient Robotic Reinforcement LearningNov 2, 2023Adapt On-the-Go: Behavior Modulation for Single-Life Robot DeploymentFeb 26, 2025FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real UsersDec 11, 2024Test-Time Alignment via Hypothesis ReweightingOct 17, 2022You Only Live Once: Single-Life Reinforcement LearningJul 26, 2023Waypoint-Based Imitation Learning for Robotic ManipulationMay 24, 2023Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback