Multimodal Question Answering for Language and Vision
Deep Learning has made tremendous breakthroughs possible in visual understanding and speech recognition. Ostensibly, this is not the case in natural language processing (NLP) and higher level reasoning. However, it only appears that way because there are so many different tasks in NLP and no single one of them, by itself, captures the complexity of language. I will talk about dynamic memory networks for question answering. This model architecture and task combination can solve a wide variety of visual and NLP problems, including those that require reasoning.
Richard Socher is Chief Scientist at Salesforce. He was previously the CEO and founder of MetaMind, a startup that seeked to improve artificial intelligence and make it widely accessible. He obtained his PhD from Stanford working on deep learning with Chris Manning and Andrew Ng and won the best Stanford CS PhD thesis award. He is interested in developing new AI models that perform well across multiple different tasks in natural language processing and computer vision.
He was awarded the Distinguished Application Paper Award at the International Conference on Machine Learning (ICML) 2011, the 2011 Yahoo! Key Scientific Challenges Award, a Microsoft Research PhD Fellowship in 2012 and a 2013 "Magic Grant" from the Brown Institute for Media Innovation and the 2014 GigaOM Structure Award.