Visual recognition has witnessed significant improvements thanks to the recent advances of deep visual representations. In its most popular form, recognition is performed on web data, including images and videos uploaded by users to platforms such as YouTube or Facebook. However, the role of perception is inherently tied to action. Active perception is vital for robotics. Robots perceive in order to act and act in order to perceive. In this talk, I will present our recent efforts to build embodied agents that solve semantic tasks in realistic 3D scenes. Here, the agent's success depends on its ability to perceive its environment at every time step and effectively navigate to its goal by predicting the right sequence of actions. In particular, I will cover our work on building complex environments that facilitate research on semantic navigation and embodied question answering.
Georgia Gkioxari is a research scientist at Facebook AI Research (FAIR). She received a PhD in computer science and electrical engineering from the University of California at Berkeley under the supervision of Jitendra Malik in 2016. Her research interests lie in computer vision, with a focus on object and person recognition from static images and videos. In 2017, Georgia received the Marr Prize at ICCV for "Mask R-CNN".