Visual Reasoning via Feature-wise Linear Modulation
Visual Reasoning - answering image-related questions which require a multi-step process to answer - is a task that explores how well models can learn about complex organizational structure of objects in the world. In this talk, I introduce a widely applicable form of Conditional Normalization we call FiLM: Feature-wise Linear Modulation. FiLM is a straightforward way of locally modifying the elements of a computational pipeline. In our application to Visual Reasoning, FiLM adapts the layers of a convolutional neural network to the specific question at hand. I will also show how FiLM-based models can generalize to challenging, new data from few examples or even, to an extent, to the zero-shot setting.
Aaron is an Assistant Professor in the Department of Computer Science and Operations Research (DIRO) at the University of Montreal, and member of the LISA lab. His current recent research interests focus on the development of deep learning models and methods. He is particularly interested in developing probabilistic models and novel inference methods. While he has mainly focused on applications to computer vision, he is also interested in other domains such as natural language processing, audio signal processing, speech understanding and just about any other artificial-intelligence-related task.