Showing 1–20 of 32 results
/ Date/ Name
Jul 10, 2023Ranking with Long-Term ConstraintsMay 26, 2020Active Imitation Learning with Noisy GuidanceMar 3, 2021Successor Feature Sets: Generalizing Successor Representations Across PoliciesOct 21, 2025The Emergence of Complex Behavior in Large-Scale Ecological EnvironmentsFeb 26, 2024A Surprising Failure? Multimodal LLMs and the NLVR ChallengeOct 3, 2022Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy OptimizationJun 9, 2020Constrained episodic reinforcement learning in concave-convex and knapsack settingsMar 2, 2023Interactive Text GenerationApr 12, 2024Adversarial Imitation Learning via BoostingJul 23, 2015LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet AllocationOct 9, 2025Expressive Value Learning for Scalable Offline Reinforcement LearningApr 9, 2026$p1$: Better Prompt Optimization with Fewer PromptsApr 25, 2024REBEL: Reinforcement Learning via Regressing Relative RewardsAug 3, 2017The UMD Neural Machine Translation Systems at WMT17 Bandit Learning TaskMay 28, 2025Scaling Offline RL via Efficient and Expressive Shortcut ModelsOct 6, 2024Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHFOct 15, 2025Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression BeaconsFeb 22, 2026LLMs Can Learn to Reason Via Off-Policy RLFeb 5, 2019Non-Monotonic Sequential Text GenerationApr 12, 2024Dataset Reset Policy Optimization for RLHF