Enabling World Models via Unsupervised Representation Learning of Environments
Recent advances in deep neural networks have enabled impressive and often superhuman performance in tasks such as object recognition, object detection, segmentation, image description, visual question-answering and even medical image diagnosis. In many such scenarios, achieving state of the art performance requires collecting large amounts human-labeled data, which is expensive to acquire. In order to build intelligent agents that quickly adapt to new scenes, conditions, tasks, we need to develop techniques, algorithms and models that can operate on little data or that can generalize from training data that is not similar to the test data. World Models have long been hypothesized to be a key piece in the solution to this problem. In this talk I will describe our recent advances for modeling sequential observations. These approaches can help with building agents that interact with the environment and mitigate the sample complexity problems in reinforcement learning. They can also enable agents that generalize quicker to new scenarios, tasks, objects and situations and are thus more robust to environment changes.
Dumitru Erhan is a Staff Research Scientist in the Google Brain team in San Francisco. He received a PhD from University of Montreal (MILA) in 2011 with Yoshua Bengio, where he worked on understanding deep networks. Afterwards, he has done research at the intersection of computer vision and deep learning, notably object detection (SSD), object recognition (GoogLeNet), image captioning (Show & Tell), visual question-answering, unsupervised domain adaptation (PixelDA), active perception and others. Recent work has focused on video prediction and generation, as well as its applicability to model-based reinforcement learning. He aims to build and understand agents that can learn as much as possible to self-supervised interaction with the environment, with applications to the fields of robotics and self-driving cars.