Towards Agents That Can See, Talk, Act, and Reason
Wouldn't it be nice if machines could understand content in images and communicate this understanding as effectively as humans? Such technology would be immensely powerful, be it for aiding a visually-impaired user navigate a world built by the sighted, assisting an analyst in extracting relevant information from a surveillance feed, educating a child playing a game on a touch screen, providing information to a spectator at an art gallery, or interacting with a robot. As computer vision and natural language processing techniques are maturing, we are closer to achieving this dream than we have ever been. In this talk, I will discuss our efforts towards building agents that can see, talk, act, and reason. Given an image and a natural language question about the image (e.g., “What kind of store is this?”, “How many people are waiting in the queue?”, “Is it safe to cross the street?”), can we build agents that produce an accurate natural language answer (“bakery”, “5”, “Yes”). Instead of answering individual questions about an image in isolation, can we build machines that can hold a sequential natural language conversation with humans about visual content? Instead of just passively answering questions, can agents navigate in an environment to gather the necessary information to answer the questions? And finally, how can we teach machine common sense so their interactions with humans are natural and seamless?
Devi Parikh is an Assistant Professor in the School of Interactive Computing at Georgia Tech, and a Research Scientist at Facebook AI Research (FAIR). Her research interests include computer vision and AI in general and visual recognition problems in particular. Her recent work involves exploring problems at the intersection of vision and language, and leveraging human-machine collaboration for building smarter machines. She received her Ph.D. from Carnegie Mellon University in 2009. She is a recipient of an NSF CAREER award, an IJCAI Computers and Thought award, a Sloan Research Fellowship, an Office of Naval Research (ONR) Young Investigator Program (YIP) award, an Army Research Office (ARO) Young Investigator Program (YIP) award, an Allen Distinguished Investigator Award in Artificial Intelligence from the Paul G. Allen Family Foundation, four Google Faculty Research Awards, an Amazon Academic Research Award, an Outstanding New Assistant Professor award from the College of Engineering at Virginia Tech, a Rowan University Medal of Excellence for Alumni Achievement, Rowan University's 40 under 40 recognition, a Forbes' list of 20 "Incredible Women Advancing A.I. Research" recognition, and a Marr Best Paper Prize awarded at the International Conference on Computer Vision (ICCV).