A Picture is Worth a Thousand Words: Towards Multimodal, Multilingual Context Models

In Computational Linguistics, work towards understanding or generating language has been primarily based solely on textual information. However, when we humans process a text, be it written or spoken, we also take into account cues from the context in which such a text appears, in addition to our background and common sense knowledge. This is also the case when we translate text. For example, a news article will often contain images and may also contain a short video and/or audio clip. Users of social media often post photos and videos accompanied by short textual descriptions. The additional information can help minimise ambiguities and elicit unknown words. In this talk I will introduce a recent area of research that addresses the automatic translation of texts from rich context models that incorporate multimodal information, focusing on visual cues from images. I will cover work analysing how humans perform translation in the presence/absence of visual cues and then move on to datasets and computational models -- based on deep learning -- that have been proposed for this problem. I will conclude by highlighting the opportunities and challenges that deep learning brings to this area.

Dr. Lucia Specia is Professor of Natural Language Processing at Imperial College London (since 2018) and the University of Sheffield (since 2012). Her research focuses on various aspects of data-driven approaches to language processing, with a particular interest in multimodal and multilingual context models and work at the intersection of language and vision. Her work has been applied to various tasks such as machine translation, image captioning, quality estimation and text adaptation. She is the recipient of the MultiMT ERC Starting Grant on Multimodal Machine Translation (2016-2021) and is currently involved in other funded research projects on machine translation (H2020 Bergamot, APE-QUEST), multilingual video captioning (British Council MMVC) and text adaptation (H2020 SIMPATICO). She was previously involved in 10+ funded research projects and has completed the supervision of 11 PhD students. In the past she worked as Senior Lecturer at the University of Wolverhampton, UK (2010-2011), and research engineer at the Xerox Research Centre, France (2008-2009). She received a PhD in Computer Science from the University of São Paulo, Brazil, in 2008. She has published 150+ research papers in peer-reviewed journals and conference proceedings.

