Neural Scene Representation and Rendering
Scene representation—the process of converting visual sensory data into concise descriptions—is a requirement for intelligent behavior. Recent work has shown that neural networks excel at this task when provided with large, labeled datasets. However, removing the reliance on human labeling remains an important open problem. To this end, we introduce the Generative Query Network (GQN), a framework within which machines learn to represent scenes using only their own sensors. The GQN takes as input images of a scene taken from different viewpoints, constructs an internal representation, and uses this representation to predict the appearance of that scene from previously unobserved viewpoints. The GQN demonstrates representation learning without human labels or domain knowledge, paving the way toward machines that autonomously learn to understand the world around them.
- Ali Eslami is a staff research scientist at DeepMind. His research is focused on getting computers to learn generative models of images that not only produce good samples but also good explanations for their observations. Prior to this, he was a post-doctoral researcher at Microsoft Research in Cambridge. He did his PhD in the School of Informatics at the University of Edinburgh, during which he was also a visiting researcher in the Visual Geometry Group at the University of Oxford.