Common Sense Video Understanding at TwentyBN
Deep learning has evolved not linearly but through a series of step-functions: sudden unexpected outbreaks of capability, which fundamentally changed the envelope of what computers are able to do. At TwentyBN, we have been taking the bet that the next outbreak of capability will be related to video understanding. We have created spatio-temporal video models, video infrastructure, as well as a data operation that allowed us to create many hundred thousands of labeled videos, showing everyday common-sense scenes and situations - many of them designed to be extremely subtle and hard to distinguish. This allowed us to successfully train neural networks end-to-end on a wide range of action understanding tasks, that neither hand-engineering nor neural networks had appeared anywhere near solving just a few months ago. I will show how these recognition tasks now drive commercial value at TwentyBN, and how they drive our long-term AI agenda, which represents another, longer term, bet on learning common sense world knowledge through video.
Roland Memisevic received his PhD in Computer Science from the University of Toronto in 2008. He subsequently held positions as research scientist at PNYLab, Princeton, as post-doctoral fellow at the University of Toronto and ETH Zurich, and as junior professor at the University of Frankfurt. In 2012 he joined the MILA deep learning group at the University of Montreal as assistant professor. He has been on leave from his academic position since 2016 to lead the research efforts at Twenty Billion Neurons, a German-Canadian AI startup he co-founded. Roland is Fellow of the Canadian Institute for Advanced Research (CIFAR).