"au:"Pradeep Varakantham"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Pradeep Varakantham"" — arXiv2 Search

Showing 21–40 of 56 results

/ Date/ Name

Dec 27, 2022Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization Jan 27, 2023Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties Nov 20, 2019Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning Mar 27, 2018Entropy based Independent Learning in Anonymous Multi-Agent Settings Jul 13, 2024Preserving the Privacy of Reward Functions in MDPs through Deception Dec 16, 2023Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning Feb 8, 2025Improving Environment Novelty Quantification for Effective Unsupervised Environment Design Jun 14, 2024Bootstrapping Language Models with DPO Implicit Rewards Feb 10, 2026Efficient Unsupervised Environment Design through Hierarchical Policy Representation Learning Oct 1, 2025On Discovering Algorithms for Adversarial Imitation Learning Sep 30, 2023Enhancing the Hierarchical Environment Design via Generative Trajectory Modeling Feb 4, 2023Diversity Induced Environment Design via Self-Play Feb 21, 2023Future Aware Pricing and Matching for Sustainable On-demand Ride Pooling Feb 21, 2023Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning Sep 13, 2020Zone pAth Construction (ZAC) based Approaches for Effective Real-Time Ridesharing Sep 16, 2021Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health Dec 1, 2021Conditional Expectation based Value Decomposition for Scalable On-Demand Ride Pooling Dec 7, 2024Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs Oct 10, 2024UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations Feb 20, 2024SPRINQL: Sub-optimal Demonstrations driven Offline Imitation Learning

← Previous Next →