Video Understanding: What to Expect Today and Tomorrow?
In this talk I will give an overview of recent advances in video understanding. For humans, understanding and interpreting the video signal that enters the brain is an amazingly complex task. Approximately half the brain is engaged in assigning a meaning to the incoming imagery, starting with the categorization of all visual concepts in the scene, like an airplane or a cat face. Thanks to yearly concept detection competitions, vast amounts of training data, and several artificial intelligence breakthroughs, categorization of video at the concept level has now matured from an academic challenge to a commercial enterprise. As a natural response, the academic community shifts the attention to more precise video understanding in the forms of localized actions, like phoning and sumo wrestling, as well as translating videos into single sentence summaries such as ‘a person changing a vehicle tire’ and ‘a man working on a metal crafts project’. We present recent results in these exciting new directions and showcase real-world retrieval with the state-of-the-art MediaMill video search engine, even for recognition scenarios where training examples are absent.
Cees Snoek is a director of QUVA, the joint research lab of the University of Amsterdam and Qualcomm on deep learning and computer vision. He is also a principal engineer at Qualcomm and an associate professor at the University of Amsterdam. He was previously visiting scientist at Carnegie Mellon University, Fulbright scholar at UC Berkeley and head of R&D at Euvision Technologies (acquired by Qualcomm). His research interests focus on video and image recognition. Dr. Snoek is recipient of several career awards, including the Netherlands Prize for ICT Research. Cees is general chair of ACM Multimedia 2016 in Amsterdam.