Learning How to Generate the Future in Videos

Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging, in addition to being fundamentally ill-posed. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. In this talk, I will describe a method that is first at providing an effective stochastic multi-frame prediction for real-world video and present compelling results on a couple of robotics benchmarks.

Dumitru Erhan is a Senior Research Scientist at Google Brain, where he is focused on designing machine learning algorithms that make it possible for agents to interact meaningfully with the world. Previously, he was investigating unsupervised domain adaptation, object detection, image recognition, image captioning, and understanding of why neural networks work, in addition to working on the first version of Google Photo Search. Dumitru got his PhD with Yoshua Bengio at University of Montreal in 2011 and has been focusing on building and understanding intelligent models of the world ever since.

