Parsa Ghaffari

CEO & Founder
AYLIEN

Byte2vec & its Application to Natural Language Processing Problems

In this talk, we present byte2vec: a flexible embedding model constructed from bytes, and its application to downstream NLP tasks such as Sentiment Analysis. Byte2vec is an embedding model that is constructed directly from the rawest forms of input: bytes, and is: i. truly language-independent; ii. particularly apt for synthetic languages through the use of morphological information; iii. intrinsically able to deal with unknown words; and iv. directly pluggable into state-of-the-art NN architectures. Pre-trained embeddings generated with byte2vec can be fed into state-of-the-art models; byte2vec can also be directly integrated and fine-tuned as a general-purpose feature extractor, similar to VGGNet's current role for computer vision.

Motivation: In today's fragmented, globalized world, supporting multiple languages in NLU and NLP applications is more important than ever. The inherent language dependence in classical Machine Learning and rule-based NLP systems has traditionally been a barrier to scaling said systems to new languages. This dependence typically manifests itself in feature extraction, as well as in pre-processing steps. In this talk, we present byte2vec as an extension to the well-known word2vec embedding model to facilitate dealing with multiple languages and unknown words.

Parsa Ghaffari is an engineer and entrepreneur working in the field of Artificial Intelligence and Machine Learning. He currently runs AYLIEN, a leading NLP API provider focused on building and offering easy to use technologies for analyzing and understanding textual content at scale.

Buttontwitter Buttonlinkedin

As Featured In

Original
Original
Original
Original
Original
Original

Partners & Attendees

Intel.001
Nvidia.001
Graphcoreai.001
Ibm watson health 3.001
Facebook.001
Acc1.001
Rbc research.001
Twentybn.001
Forbes.001
Maluuba 2017.001
Mit tech review.001
Kd nuggets.001