Open-domain Question Answering: State of the Art and Future Perspectives
Question answering (QA) is one of the earliest and core topics in natural language processing and has played a central role in many real-world applications such as search engines and personal assistants. The problem of open-domain QA, which aims to automatically answer questions posed by humans based on a large collection of unstructured documents, has (re-)gained a lot of popularity in the last couple of years. This talk will review some of the exciting advances in the field, including some of my earlier and recent experiences in building neural QA systems. In particular, I will discuss the role of pre-training in question answering, learning dense representations for retrieval, and the trade-off between accuracy, storage, and runtime efficiency. I will conclude with current limitations and future directions.
3 Key Takeaways:
- Today, we can build a single end-to-end neural open-domain QA system that can answer 50\%-70\% of the questions accurately based on the full English Wikipedia.
- The progress is largely driven by the development of pre-trained language representations and effective methods for learning dense retrieval.
- Representing text source as a collection of dense vectors opens up a new possibility for building next-generation knowledge bases.
Danqi Chen is an Assistant Professor of Computer Science at Princeton University and co-leads the Princeton NLP Group. Danqi’s research focuses on deep learning for natural language processing, with an emphasis on the intersection be- tween text understanding and knowledge representation/reasoning and applications such as question answering and infor- mation extraction. Before joining Princeton, Danqi worked as a visiting scientist at Facebook AI Research in Seattle. She received her Ph.D. from Stanford University (2018) and B.E. from Tsinghua University (2012), both in Computer Science.