Dhruv Batra

Chevron down

Visual Question Answering and CloudCV

In this talk, I will describe VQA, the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. We are collecting a dataset containing 100,000’s of images and questions and discuss the information it provides.

I will also describe CloudCV, an ambitious system that will provide access to state-of-the-art distributed computer vision algorithms as a cloud service. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms.

Dhruv Batra is an Assistant Professor at the Bradley Department of Electrical and Computer Engineering at Virginia Tech, where he leads the VT Machine Learning & Perception group.

Prior to joining VT, he was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located in the campus of University of Chicago. He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In past, he has held visiting positions at the Machine Learning Department at CMU, and at CSAIL MIT.

His research interests lie at the intersection of machine learning, computer vision, and AI, with a focus on developing scalable algorithms for learning and inference in probabilistic models for holistic scene understanding. He has also worked on other topics such as interactive co-segmentation of large image collections, human body pose estimation, action recognition, depth estimation, and distributed optimization for inference and learning in probabilistic graphical models.

He is a recipient of Carnegie Mellon Dean's Fellowship (2007), Google Faculty Research Award (2013), Virginia Tech Teacher of the Week (2013), Army Research Office (ARO) Young Investigator Program (YIP) award (2014), the National Science Foundation (NSF) CAREER award (2014), and Virginia Tech CoE Outstanding New Assistant Professor award (2015). His research is supported by NSF, ARO, ONR, Amazon, Google, Microsoft, and NVIDIA.

This website uses cookies to ensure you get the best experience. Learn more