Reproducibility in Reinforcement Learning with Physical Robots
Recent breakthroughs in computer games and board games have shown the power and promise of deep reinforcement learning (RL) approaches in sequential decision making. While these advances have inspired many to apply deep RL techniques in real-world problems, applications of these techniques for moment-by-moment control of real robots has been a challenge. At the same time, researchers have recently identified a deepening crisis of reproducibility in deep RL research, hindering effective sharing of knowledge. In this talk, I present insights based on our recent works at Kindred which indicate that these two challenges are closely related and the reproducibility crisis of deep RL can be far worse with physical robots. In our work, we systematically investigate these challenges and suggest steps that enable learning nearly as reliably with physical robots as with virtual ones. This allowed us to perform extensive deep RL research such as hyperparameter study of learning algorithms on multiple tasks as well as solve challenging problems such as docking to a charging station by a mobile robotic base solely using real interactions. We incorporated our insights into SenseAct, an open-source toolkit for real-world robot learning, providing implementations of six different RL tasks with three different commercially available robots as well as a framework for implementing new tasks efficiently. SenseAct has facilitated reproducibility of learning results in physical environments and allowed us to prototype learning solutions directly in production setups, bringing deep RL one step closer to the real world.
Rupam Mahmood is Lead of the AI Research team at Kindred, where he designs and studies learning systems for controlling Kindred's robotic products. His primary objective is to understand the underlying principle behind real-time goal-driven systems by building them for robots. He is the creator of SenseAct, the first open-source toolkit for real-time reinforcement learning with physical robots. He received his Ph.D. in statistical machine learning from the department of computing science at the University of Alberta. He was supervised by Richard Sutton. During his graduate studies, he developed and studied learning-rate adaptation, representation search, and off-policy learning algorithms, a class of methods for learning behaviors and rich knowledge representations in a counter-factual manner.