Showing 1–12 of 12 results
/ Date/ Name
Jun 8, 2020Randomized Policy Learning for Continuous State and Action MDPsSep 21, 2017An Empirical Dynamic Programming Algorithm for Continuous MDPsMay 29, 2024Self-Exploring Language Models: Active Preference Elicitation for Online AlignmentSep 24, 2023ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-LearningApr 22, 2024Phi-3 Technical Report: A Highly Capable Language Model Locally on Your PhoneOct 15, 2019Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision ProcessesJul 18, 2024Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" CycleNov 10, 2023Language Models can be Logical SolversSep 25, 2023Evaluating Cognitive Maps and Planning in Large Language Models with CogEvalJun 4, 2023Fine-Tuning Language Models with Advantage-Induced Policy AlignmentOct 31, 2024Progressive Safeguards for Safe and Model-Agnostic Reinforcement LearningJul 2, 2024Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning