Quantifying Generalization in Deep Reinforcement Learning
Among the most common benchmarks in deep RL, it is customary to use the same environments for both training and testing. Unfortunately, this practice offers relatively little insight into an agent’s ability to generalize. To address this issue, I will introduce a procedurally generated environment called CoinRun, which provides distinct sets of levels for training and testing. Using this benchmark, I will show that agents overfit to surprisingly large training sets. I will then show that deeper convolutional architectures improve generalization, as do methods traditionally found in supervised learning, including L2 regularization, dropout, data augmentation and batch normalization.
- Agents require a surprisingly large number of training environments to learn to generalize
- Regularization methods common in supervised learning can help reduce overfitting in deep RL
- Precise generalization metrics promote the design of better algorithms and architectures
Karl Cobbe is currently a research scientist at OpenAI. He received his BS in computer science with distinction from Stanford University in 2014. He first joined OpenAI as a research fellow, working under the mentorship of John Schulman. His research primarily focuses on generalization and transfer in deep reinforcement learning. Karl is particularly interested in leveraging procedural generation to create diverse training environments, to better investigate the limitations of current algorithms and the factors that lead to overfitting.