Generalization and the Role of Data in Reinforcement Learning
Over the past decade, we have witnessed a revolution in supervised machine learning, as large, high-capacity models trained on huge datasets attain amazing results across a range of domains, from computer vision to natural language processing and speech recognition. But can these gains in supervised learning performance translate into more effective and optimal decision making? The branch of machine learning research that studies decision making is called reinforcement learning, and while more effective and performant reinforcement learning methods have also been developed over the past decade, in general it has proven challenging for reinforcement learning to benefit from large datasets, because it is conventionally thought of as an active online learning framework, which makes reusing large previously collected datasets difficult. In this talk, I will discuss how reinforcement learning algorithms can enable broad generalization through the use of large and diverse prior datasets. This concept lies at the core of offline reinforcement learning, which addresses the development of reinforcement learning methods that do not require active interaction with their environment but instead, much like current supervised learning methods, learn from previously collected datasets. Crucially, unlike supervised learning, such methods directly optimize for optimal downstream decision making, maximizing long-horizon reward signals. I will describe the computational and statistical challenges associated with offline reinforcement learning, describe recent algorithmic developments, and present a few promising applications.
Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as computer vision and graphics. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more.