Acquiring Diverse Robot Skills via Maximum Entropy Deep Reinforcement Learning
The intersection of expressive, general-purpose function approximators, such as neural networks, with general-purpose model-free reinforcement learning (RL) algorithms holds the promise of automating a wide range of robotic behaviors: reinforcement learning provides the formalism for reasoning about sequential decision making, while large neural networks can process high-dimensional and noisy observations to provide a general representation for any behavior with minimal manual engineering. However, applying model-free RL algorithms with multilayer neural networks (i.e., deep RL) to real-world robotic control problems has proven to be very difficult in practice: the sample complexity of model-free methods tends to be quite high, and training tends to yield high-variance results. In the talk, I will discuss how maximum entropy principle can enable deep RL for real-world robotic applications. First, by representing policies as expressive energy-based models, maximum entropy RL leads to effective, multi-modal exploration that can reduce sample complexity. Second, maximum entropy policies can promote reusability through compositionality, meaning that existing policies can be combined to create new compound policies without extra interaction with the environment. I will demonstrate these properties in both simulated and real world robotic tasks.
Tuomas Haarnoja is a PhD candidate in the Berkeley Artificial Intelligence Research Lab (BAIR) at UC Berkeley, advised by prof. Pieter Abbeel and prof. Sergey Levine. His research focus is on extending deep reinforcement learning to provide for flexible, effective robot control that can handle the diversity and variability of the real world. During his PhD, Tuomas has spent time as an intern at Google Brain, where he developed model-free algorithms for robotic applications requiring high sample efficiency. Before joining BAIR, Tuomas received a master's degree in Space Robotics and Automation from Luleå University of Technology, Sweden, and Aalto University of Technology, Finland, and worked as a research scientist at VTT Technical Research Centre of Finland.