Showing 1–13 of 13 results
/ Date/ Name
Nov 25, 2019Playing it Safe: Adversarial Robustness with an Abstain OptionApr 22, 2022The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human ModelsDec 13, 2023The Effective Horizon Explains Deep RL Performance in Stochastic EnvironmentsMar 5, 2024Correlated Proxies: A New Definition and Improved Mitigation for Reward HackingMay 29, 2019Functional Adversarial AttacksJun 19, 2021Uncertain Decisions Facilitate Better Preference LearningDec 13, 2023Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHFApr 19, 2023Bridging RL Theory and Practice with the Effective HorizonJun 22, 2020Perceptual Adversarial Robustness: Defense Against Unseen Threat ModelsApr 9, 2025AssistanceZero: Scalably Solving Assistance GamesDec 15, 2023Toward Computationally Efficient Inverse Reinforcement Learning via Reward ShapingMay 8, 2019Capture, Learning, and Synthesis of 3D Speaking StylesJan 14, 2025Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision