Bridging Perception and Action in Deep Robotic Learning
The link between perception and action serves at the core of intelligent interaction, and is one of the defining features of robotics. While possible to train high-capacity models with deep networks to represent this link entirely end-to-end (e.g. raw pixels to joint torques), these models often require exceedingly large amounts of training time and data -- making them difficult to be learned on real systems. In this talk, I will present an alternative: learning deep models that map from visual observations to the affordances of robot actions. In the context of robotic manipulation, I will discuss how this intermediate step of learning action-based visual representations leads to significantly more sample efficient training while maintaining the ability to generalize to novel objects and scenarios. Our experiments demonstrate that when combined with deep reinforcement learning, our algorithm makes it possible to learn complex vision-based manipulation skills in less than a few hours on simulated and real robot platforms.
Andy Zeng is a PhD student in Computer Science at Princeton University, where he works on machine learning for robot perception and manipulation. He is a part of the Princeton Vision and Robotics Group, advised by Thomas Funkhouser, and is currently visiting Google Brain Robotics. He received his Bachelors double major in Computer Science and Mathematics from UC Berkeley. Andy’s research is to develop learning algorithms that enable real robots to intelligently interact with the physical world and improve themselves over time. He was perception team lead for Team MIT-Princeton, winning 1st place (stow task) at the worldwide Amazon Robotics Challenge 2017. His research has been recognized through an NVIDIA Fellowship, Gordon Y.S. Wu Fellowship and Wu Prize.