Understanding Malformed Text with Word Embeddings
In recent years word embeddings have become increasingly popular, constituting a building block for many practical applications across NLP, ML, Search and related disciplines. The power of word embeddings stems from the fact that they allow us to represent words as vectors of real numbers of arbitrary dimensionality in such a way, that a distance in the vector space can be interpreted as a measure of semantic similarity between the words. One of the limitations of many existing techniques, however, is their inability to represent malformed words. In this talk, we will discuss the challenges state of the art word embedding methodologies face when applied to real-world, human-generated text. We will explore strategies of alleviating some of these problems and discuss how we can leverage word embeddings to work well on misspelled texts.
Aleksandra is a Software Engineer working on Facebook search, with experience in natural language processing and machine learning. She currently works on related searches and before that, she was a part of the team building the spelling correction logic for FB search.