Generative Models for Machine Vision
Recent advances in deep neural networks have enabled impressive and often superhuman performance in tasks such as object recognition, object detection, segmentation, image description, visual question-answering and even medical image diagnosis. In many such scenarios, achieving state of the art performance requires collecting large amounts human-labeled data, which is expensive to acquire. In order to build intelligent agents that quickly adapt to new scenes, conditions, tasks, we need to develop techniques, algorithms and models that can operate on little data or that can generalize from training data that is not similar to the test data. In this talk, I will describe one potential solution. It leverages generative modeling, which can help with building agents that interact with the environment and mitigate the sample complexity problems in reinforcement learning via so-called world models.
Dumitru Erhan is a Staff Research Scientist in the Google Brain team in San Francisco. He received a PhD from University of Montreal (MILA) in 2011 with Yoshua Bengio, where he worked on understanding deep networks. Afterwards, he has done research at the intersection of computer vision and deep learning, notably object detection (SSD), object recognition (GoogLeNet), image captioning (Show & Tell), visual question-answering, unsupervised domain adaptation (PixelDA), active perception and others. Recent work has focused on video prediction and generation, as well as its applicability to model-based reinforcement learning. He aims to build and understand agents that can learn as much as possible to self-supervised interaction with the environment, with applications to the fields of robotics and self-driving cars.