Rewards, Resets, Exploration: Bottlenecks for Scaling Deep RL in Robotics
Deep RL for practical applications such as robotics has been seen as great challenges. However, recent successes of sample-efficient deep RL algorithms in real-world learning and robust Sim2Real transfers appear to hint that the main bottlenecks for scaling deep RL in robotics lie elsewhere. In this talk, I'll focus on three critical components required for massively scaling deep RL in simulation or real-world: rewards, resets, and exploration. I'll discuss our recent work on the universal reward definitions through natural languages; the concurrent learning of reset policy for safe and continual learning; and the efficient exploration through goal-driven or empowerment-based action abstractions. I'll end the talk by highlighting future directions and other challenges toward enabling robots to be as diversely functional as humans.
Key Takeaways: - Some Deep RL algorithms already learns relatively sample-efficiently in real-world, under reasonable task settings. - Under aggressive randomization, simulation-to-real (sim2real) works well for a number of robotic tasks. - To massively scale deep RL to solve diverse control tasks, we need to solve rewards, resets, and exploration. - Robotics = Massive Room for Diverse Emergent Behaviors = Path to AGI
Shane Gu is a Research Scientist at Google Brain, where he mainly works on problems in deep learning, reinforcement learning, robotics, and probabilistic machine learning. His recent research focuses on sample-efficient RL methods that could scale to solve difficult continuous control problems in the real-world, which have been covered by Google Research Blogpost and MIT Technology Review. He completed his PhD in Machine Learning at the University of Cambridge and the Max Planck Institute for Intelligent Systems in Tübingen, where he was co-supervised by Richard E. Turner, Zoubin Ghahramani, and Bernhard Schölkopf. During his PhD, he also collaborated closely with Sergey Levine at UC Berkeley/Google Brain and Timothy Lillicrap at DeepMind. He holds a B.ASc. in Engineering Science from the University of Toronto, where he did his thesis with Geoffrey Hinton in distributed training of neural networks using evolutionary algorithms.