BOB-ROS: A Deep RL Simulation Environment for ROS and Gazebo
Deep Reinforcement Learning (Deep RL) is presently one of the hottest and fastest-paced application areas of deep learning and machine learning as a whole. This intense focus is driven by the eye-watering potential of importing the ground-breaking accuracy improvements of deep neural networks seen in large-scale supervised learning benchmarks, into the world of optimal decision making and control. Arguably one of the most important components of Deep RL is the simulation environment. Simulation environments play the key role of providing a benchmarking platform for comparing the cornucopia of different RL algorithms, hence giving researchers and practitioners crucial feedback on how effective their ideas are. To date, most simulation environments have been focused on gaming environments, due to their fast physics engines, semi-realistic rendering pipelines and the ease with which one can use their points systems as an out-of-the-box reward function, which typically isn't too delayed. Ultimately, the end goal of RL research is to build agents/robots which can interact effectively within real-world environments. The applications are limitless, ranging from autonomous vehicles to drones which can deliver packages. When viewed from this lens, the gaming-as-a-benchmark approach suffers from several shortcomings. The first issue is somewhat obvious: a game environment, by design, has its own rules and isn't completely governed by the rules of the physical world a robot must operate in, and as such, has limited applicability. We argue that the physical world presents plenty of highly challenging environments to navigate, and there simply is no need to use the additional rules imposed by games to test RL systems, and, indeed these additional rules can be a distraction from building agents which are actually useful for society. The second is the lack of realistic agent-centric input channels. Almost all real-world agents have multiple sensors recording stimuli in parallel. A good example is an autonomous vehicle which receives, in real-time, data from odometric sensors, GPS, IMU, LIDAR, RADAR, SONAR and cameras. Data from all these sensors need to be fused effectively to form a state representation useful for the task at hand. Gaming agents have a very limited (if any) array of sensors: one simply learns from pixels showing the state of the environment and the agent form a third-person point of view.
To address these shortcomings we introduce a new RL benchmarking tool: the Benchmark Of Behavior in the Robot Operating System (BOB-ROS). BOB-ROS is a simulation environment made for the sole purpose of actually building robots. Therefore, it is a highly pertinent simulation environment for deep RL systems in order to determine their utility for robotics specifically.
We present results of using several standard deep RL tools for the purpose of training a drone to fly in a maze-like office environment from A to B without hitting any obstacles. We will present the challenges in using deep RL in a slower and more complex simulation environment like those built upon ROS, and the solutions we used to overcome these challenges.
Prerequisites: Basics of Reinforcement Learning fundamentals, Basic knowledge of probability and statistics, (Optional) Some familiarity with OpenAI gym.
Yunus did his PhD at the Machine Learning lab at the University of Cambridge, under the supervision of Carl Rasmussen and Zoubin Ghahramani, now chief scientist at Uber! His PhD was in scalable methods for a brand new type of Gaussian process known as the structured Gaussian process: a Gaussian process with a covariance structure chosen to make it scalable. After getting addicted to getting slow but awesome code run faster during his PhD, high-frequency trading seemed like a natural choice for Yunus, so he spent two years at Tower Research Capital, a New York-based quantitative hedge fund. He then switched gears (and countries) and joined one of the relatively older AI research labs in the Bay Area, namely Vicarious for another two years, where he worked on deep generative models and scalable sum-product networks. Getting the urge to apply some deep learning models in the wild, he joined comma.ai, a self-driving car startup in San Francisco, as Chief Machine Learning Officer. There he built an operational, self-driving system purely for the highway and congested highway traffic scenarios. Since then he has been a senior research scientist at Uber AI Labs, where he has implemented Bayesian optimization and reinforcement learning systems at Uber scale. He is also an advisor and investor in several ML startups across the globe.