How to Automate Your Bot Training: Artificial Data for Reliable Artificial Intelligence
Training effective chatbots, like other AI systems that process Natural Language input, requires large amounts of training data. For each intent you want your chatbot to recognize, you need to provide it with a large number of example sentences that express that intent. Until now, these large training corpora had to be produced and tagged manually, resulting in inevitable issues with coverage, consistency and tagging standards. In this talk, we will discuss how Bitext is applying its Natural Language Generation technology to generate Artificial Training Data. By taking a seed sentence and automatically generation many different variants with the same meaning, we can automate the most resource-intensive part of the bot creation process. We will also explore additional ways of improving bot accuracy by improving the platform’s training data, such as by the use of word embeddings augmented with linguistic information.
Antonio Valderrabanos is the CEO and founder of Bitext, and holds a PhD in Computational Linguistics. He has over 20 years of experience in the field of multilingual NLP, and is currently focused on how to exploit linguistic knowledge to improve Machine Learning engines and AI in general, to make them smarter and easier to train. Prior to founding Bitext, he worked at various R&D labs at IBM and Novell.