Emergent Complexity and Zero-Shot Transfer via Unsupervised Environment Design
How can we move deep RL beyond games, without having to hand-build a simulator that covers real-world complexity? We train an RL adversary to generate a curriculum of challenging environments. To ensure the adversary cannot create impossible environments, we constrain it using the performance of a second agent. The adversary is trained to maximize the regret, defined as the difference between the performance of the pair of agents. This motivates the adversary to generate environments that are solvable, but challenging. PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in challenging, novel environments.
Key Takeaways: 1. RL agents train in a simulated environment, but for many real-world problems we can't program a simulator to cover every possible test-case. 2. Instead, we can learn to automatically generate environments that exploit weaknesses in our agent, using a second, adversary agent. 3. We propose a new technique for adversarial environment generation which optimizes minimax regret. This produces a curriculum of environments by adjusting the difficulty level to be feasible, but outside the agent's current skill level.
Natasha Jaques recently finished her PhD at MIT, which focused on improving the social and affective intelligence of deep learning and deep reinforcement learning. She is now a Research Scientist at Google Brain and Berkeley working with Sergey Levine and Doug Eck. Her work has received an honourable mention for best paper at ICML 2019, a best paper award at the NeurIPS ML for Healthcare workshop and was part of the team that received Best Demo at NeurIPS 2016. She has interned at DeepMind, Google Brain, and was an OpenAI Scholars mentor. Her work has been featured in Quartz, the MIT Technology Review, Boston Magazine, and on CBC radio. Natasha earned her Masters degree from the University of British Columbia, and undergraduate degrees in Computer Science and Psychology from the University of Regina.