Learning Abstractions with Hierarchical Reinforcement Learning
Hierarchical RL has long held the promise of enabling deep RL to solve more complex and temporally extended tasks by abstracting away lower-level details from a higher-level agent. In this talk, we describe how to turn this promise into a reality. We present a hierarchical design in which a higher-level agent solves a task by iteratively directing a lower-level policy to reach certain goals. We describe how both levels may be trained concurrently in a highly-efficient, off-policy manner. Furthermore, we present a provably-optimal technique for learning abstract notions of `goals' without explicit supervision. Our resulting method achieves excellent performance on a suite of difficult navigation tasks.
Ofir Nachum currently works at Google Brain as a Research Scientist. His research focuses on reinforcement learning, with notable work including PCL (path consistency learning) and HIRO (hierarchical reinforcement learning with off-policy correction). He received his Bachelor's and Master's from MIT. Before joining Google, he was an engineer at Quora, leading machine learning efforts on the feed, ranking, and quality teams.