``Give Me an Example Like This'': Episodic Active Reinforcement Learning from Demonstrations
/ Authors
/ Abstract
Reinforcement Learning (RL) has achieved great success in sequential decision-making problems but often requires extensive agent-environment interactions. To improve sample efficiency, methods like Reinforcement Learning from Expert Demonstrations (RLED) incorporate external expert demonstrations to aid agent exploration during the learning process. However, these demonstrations, typically collected from human users, are costly and thus often limited in quantity. Therefore, how to select the optimal set of human demonstrations that most effectively aids learning becomes a critical concern. This paper introduces EARLY (Episodic Active Learning from demonstration querY), an algorithm designed to enable a learning agent to generate optimized queries for expert demonstrations in a trajectory-based feature space. EARLY employs a trajectory-level estimate of uncertainty in the agent’s current policy to determine the optimal timing and content for feature-based queries. By querying episodic demonstrations instead of isolated state-action pairs, EARLY enhances the human teaching experience and achieves better learning performance. We validate the effectiveness of our method across three simulated navigation tasks of increasing difficulty. Results indicate that our method achieves expert-level performance in all three tasks, converging over 50% faster than other four baseline methods when demonstrations are generated by simulated oracle policies. A follow-up pilot user study (N = 18) further supports that our method maintains significantly better convergence with human expert demonstrators, while also providing a better user experience in terms of perceived task load and requiring significantly less human time.
Journal: Proceedings of the 12th International Conference on Human-Agent Interaction