Stable Reinforcement Learning from Sensor Data
Reinforcement learning studies how to optimize sequential decisions. Such decisions are encountered in many physical systems, for example in robotics. Real systems can usually produce only relatively small datasets composed of redundant sensor data that might be hard to interpret. We developed a learning algorithm that yields stable policy updates, even with small datasets, without the need for manually tuned features. Deep learning techniques allow efficient learning in the presence of noise or distractions. Our experiments show that our techniques can learn robotic tasks with visual or tactile input from a small amount of experience.
Herke van Hoof is currently a postdoctoral fellow at McGill University in Montreal, Canada. At McGill, Herke works with Joelle Pineau at the Reasoning and Learning Lab as well as with David Meger and Gregory Dudek at the Mobile Robotics Lab. Before that, he obtained his PhD at TU Darmstadt, Germany under the supervision of Jan Peters. His research interest is in reinforcement learning for autonomous robots in perceptually challenging environments.