Marius Cobzarenco

Chevron down

Learning Semantic Representations for Chat

Businesses are increasingly talking to their customers using instant messaging and social media. These conversations contain valuable insight into the underlying causes of user behaviour. However, this channel is significantly underused because of the challenges posed by analysing informal, poorly spelt, user generated text. Traditional natural language processing is based on rules and hand engineered features which makes it too inflexible. On-boarding new languages requires human expert knowledge making it also prohibitively expensive.

In this talk, I will describe the approach we developed based on unsupervised training of deep neural networks that map sentences to a fixed size semantic representation. The model reads language character by character -- important in order to understand user-generated content and to recognise out-of-vocabulary terms. We train these language models on billions of sentences in an unsupervised fashion (no need for data annotated by humans). The semantic representations learnt are invariant to rephrasing as long as the meaning is left unchanged. The language models form the basis of our chat analytics and automation products.

I believe artificial intelligence will improve most aspects of our lives in the next decade. AI is already "eating the world"​ today. In particular, I am interested in how emerging technologies such as deep learning can be used to build frictionless natural language interfaces. To this end I co-founded re:infer. I'm an old-fashioned hacker with strong understanding of machine learning and its proxy fields such as probability theory, statistical modelling, linear algebra and multivariate calculus. Academically, my interests are in probabilistic generative models of language.

Buttontwitter Buttonlinkedin

As Featured In

Original
Original
Original
Original
Original
Original

Partners & Attendees

Intel.001
Nvidia.001
Graphcoreai.001
Ibm watson health 3.001
Facebook.001
Acc1.001
Rbc research.001
Twentybn.001
Forbes.001
Maluuba 2017.001
Mit tech review.001
Kd nuggets.001