How Do You Know What A Deep Network Is Learning For A Vision Task?
Modern deep learning algorithms are able to learn on training sets such that they achieve almost zero train error. What is all the more amazing, is that this performance tends to generalize well to unseen data - especially for visual detection and classification tasks. Increasingly, deep methods are being utilized in vision tasks such as object tracking and visual SLAM. These tasks differ fundamentally to traditional vision tasks where deep learning has been effective (e.g. object detection and classification) tasks as they are attempting to model the relative relationship between image frames. Although receiving state of the art performance on many benchmarks, it is easy to demonstrate empirically that deep methods are not always learning what we want them to learn for a given visual task - limiting their practical usage in real-world applications.
In this talk we shall discuss recent advances my group has made towards making better guarantees over the generalization of deep learning methods for visual tasks where the relative relationship between images is important - most notably object tracking and VSLAM. In particular we shall discuss a new paradigm for efficient and generalizable object tracking which we refer to as Deep-LK and its extension to 3D PointNet-LK. We shall also, discuss how these insights can be utilized in recent applications of deep learning to VSLAM. Finally, we will show some initial results on how geometric constraints can be elegantly combined with deep learning to further improve generalization performance.
Simon Lucey (Ph.D.) is an associate research professor within the Robotics Institute at Carnegie Mellon University, where he is part of the Computer Vision Group, and leader of the CI2CV Laboratory. Before returning to CMU he was a Principle Research Scientist at the CSIRO (Australia's premiere government science organization) for 5 years. He wants to draw inspiration from vision researchers of the past to attempt to unlock computational and mathematic models that underly the processes of visual perception.