"au:"Hiteshi Sharma"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Hiteshi Sharma"" — arXiv2 Search

Showing 1–12 of 12 results

/ Date/ Name

Jun 8, 2020Randomized Policy Learning for Continuous State and Action MDPs Sep 21, 2017An Empirical Dynamic Programming Algorithm for Continuous MDPs May 29, 2024Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Sep 24, 2023ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning Apr 22, 2024Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Oct 15, 2019Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes Jul 18, 2024Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle Nov 10, 2023Language Models can be Logical Solvers Sep 25, 2023Evaluating Cognitive Maps and Planning in Large Language Models with CogEval Jun 4, 2023Fine-Tuning Language Models with Advantage-Induced Policy Alignment Oct 31, 2024Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning Jul 2, 2024Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning