The "Something Something" Video Dataset
Neural networks trained on datasets like imagenet have recently led to major advances in visual object classification. A main obstacle that prevents networks from reasoning more deeply about scenes and situations, and from integrating visual information with natural language, like humans do, is their lack of common sense knowledge about the physical world. Unlike still images, fine-grained prediction tasks in videos can reveal such physical information, because videos implicitly encode properties such as 3-D geometry, materials, "objectness", and affordances. In this talk, I will describe a new video dataset we created, showing objects engaged in complex motions and interactions. I will also show how neural networks can learn from this data to make fine-grained predictions about actions and situations.
Roland Memisevic received his PhD in Computer Science from the University of Toronto in 2008. He subsequently held positions as research scientist at PNYLab, Princeton, as post-doctoral fellow at the University of Toronto and ETH Zurich, and as junior professor at the University of Frankfurt. In 2012 he joined the MILA deep learning group at the University of Montreal as assistant professor. He has been on leave from his academic position since 2016 to lead the research efforts at Twenty Billion Neurons, a German-Canadian AI startup he co-founded. Roland is Fellow of the Canadian Institute for Advanced Research (CIFAR).