Off-Policy Reinforcement Learning for Real-World Robots
Off-policy reinforcement learning algorithms let us train and evaluate policies using data collected from other policies. This makes them attractive for real-world settings like robotics, where generating real data is expensive and time-consuming. This talk covers two ways we’ve used off-policy algorithms. First, I’ll talk about how we’ve used off-policy learning to solve a challenging vision-based robotic manipulation task on real robots. Second, I’ll discuss our recent work on off-policy evaluation of policies, and reasons off-policy evaluation is important for future research of real-world reinforcement learning problems.
- Off-policy RL is promising for real-world problems, because it lets us train models over all past experience
- To efficiently research and develop RL for the real-world, we need off-policy evaluation, which lets us evaluate policies without running them directly in the final real-world environment
- We present an off-policy evaluation method that performs well on an image-based real-world robotics task
Alex Irpan is a software engineer at Google Brain, where he works on ways to apply deep reinforcement learning to robotics and other real-world problems. His research focuses on ways to leverage real-world data as much as possible for robotic manipulation problems, through techniques like transfer learning and off-policy learning. He received his BA in computer science from UC Berkeley in 2016, where he did undergrad research in the Berkeley AI Research Lab, mentored by Pieter Abbeel and John Schulman.