3D Deep Learning for Robot Perception
Deep learning has made unprecedented progress in artificial intelligence tasks from speech recognition to image recognition. In both, we ask our algorithms to reason about features in the most appropriate dimension: for natural language, we feed one-dimensional one-hot vectors of words as input to a recurrent neural network, whereas in image processing, we use two-dimensional filters over pixels in a convolutional network. However, as we are physically living in a three-dimensional world, for robot perception, it is more natural and often more useful to use three-dimensional representations and algorithms to reason about the 3D scene around us.
In this talk, I will share our recent experiences on 3D deep learning at three different levels for robot perception: local part, whole object, and global scene. At the local part level, we have developed an algorithm to learn 3D geometric descriptors to match local 3D keypoints, which is a critical step in robot mapping. At the object level, we have developed an object detector to slide a window in 3D using 3D convolutional neural networks. At the global scene level, we propose a novel approach to feed the whole 3D scene into a deep learning network, and let the network automatically learn the 3D object-to-object context relationship for joint inference with all the objects in a scene. To support 3D deep learning research, I will introduce "Marvin", a deep learning software framework to work with three-dimensional deep neural networks.
Jianxiong Xiao is an Assistant Professor in the Department of Computer Science at Princeton University and the director of the Princeton Vision Group. He received his Ph.D. from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT). His research focuses on bridging the gap between computer vision and robotics by building extremely robust and dependable computer vision systems for robot perception. Especially, he is interested in 3D Deep Learning, RGB-D Recognition and Reconstruction, Place-centric 3D Context Modeling, Synthesis for Analysis, Deep Learning for Autonomous Driving, Large-scale Crowd-sourcing, and Petascale Big Data. His work has received the Best Student Paper Award at the European Conference on Computer Vision (ECCV) in 2012 and Google Research Best Papers Award for 2012, and has appeared in popular press in the United States. Jianxiong was awarded the Google U.S./Canada Fellowship in Computer Vision in 2012, MIT CSW Best Research Award in 2011, and two Google Research Awards in 2014 and in 2015. More information can be found at: http://vision.princeton.edu.