At Google, we develop flexible state of the art machine learning systems for computer vision that not only can be used to improve our products and services, but also spur progress in the research community. Creating accurate machine learning models capable of localizing and identifying multiple objects in a 3D scene, predicting object shapes, and assigning semantic labels to different components of the scene is a core challenge in computer vision, with applications in robotics and autonomous driving. We invest a significant amount of time training and experimenting with these systems. Today we are happy to make some parts of this system available to the broader research community via the TensorFlow 3D codebase. This codebase is an open-source framework built on top of TensorFlow 2 and Keras that makes it easy to construct, train and deploy 3D semantic segmentation, 3D object detection and 3D instance segmentation models as well as other potential applications like 3d shape prediction and point cloud registration and completion.
3 Key Takeaways:
1) GPU and CPU ops for 3D submanifold sparse convolution in Tensorflow.
2) A configurable 3D sparse voxel U-Net network that is used as the feature extractor in our models.
3) Training and evaluation code for 3D Semantic Segmentation, 3D Object Detection and 3D Instance Segmentation, with support for distributed training.
Alireza Fathi is currently a senior research scientist at Google Research Machine Perception team. His main area of focus has been on 3d scene understanding for the last two to three years. Before joining Google, he spent a couple of great years at Apple working on 3d computer vision research. Before that he was a Postdoctoral Fellow in FeiFei Li's lab at the CS Department at Stanford University. He received his Ph.D. degree from Georgia Institute of Technology, and his B.Sc. degree from Sharif University of Technology.