Last month at the AI Assistant Summit in San Francisco, we were joined by Pararth Shah, Research Engineer at Google, who we spoke with about his current work as well as AI and deep learning more generally. Pararth is currently working to improve the process in which conversational AI interacts with the user by creating an AI that actually learns from it's conversation and interaction with the user.
I work at Google AI on a team focused on conversational AI research. We experiment with new ways to build agents that converse with users in natural language to help them complete tasks. We are exploring dialogue models trained from data and user feedback instead of relying on manual engineering, as the former is more scalable, flexible and enables lifelong learning and improvement of the conversational agent. Since this space is still in its early stages, I spend my time keeping up-to-date with current research, coming up with new ideas, building prototypes and running evaluations.
I got interested in Machine Learning during my undergrad at IIT Bombay, where I contributed to a couple of ML research projects. During my masters in CS at Stanford, I chose the Artificial Intelligence specialization and took courses and research projects in computer vision, graph mining and NLP. I also TA-d a couple of AI courses.
I was always fascinated by the idea of machines that converse with humans in natural language. In the last few years, the rise of messaging platforms, voice assistants and chatbots have made this idea more popular. The ability to pass the Turing Test would be a key, if not the defining, capability of a general AI. But we are far away from that and there are many interesting technical challenges to be solved along the way.
Current research in dialogue systems stands on the shoulders of a lot of seminal work in Deep Learning and Natural Language Processing, tracing back to word vectors and recurrent network based language understanding models. More recently, end-to-end conversational models, which allow an error gradient to be passed through from the response generation component back to the language understanding component, enable training the model directly with reinforcement from user feedback. In essence, the agent can improve its own language understanding capability by engaging in a conversation with the user and analyzing whether its responses led to higher user satisfaction.
I am quite excited about the application of reinforcement learning techniques to train conversational agents. On the one hand, multi-turn dialogue is somewhat different from other NLP tasks like translation and semantic parsing, as it involves reasoning over multiple steps into the future to choose actions that will optimally help the user. RL is well-suited for such sequential decision making problems. On the other hand, though, dialogue systems are very different from traditional RL domains like game-playing (Atari, AlphaGo) and robotics, as dialogue doesn’t have a well-defined reward function or a realistic simulation environment to train the agent. So it is going to be appealing to come up with new RL techniques tailored to the strengths and limitations of conversational agents. Recent work on human-in-the-loop RL interactive RL and learning rewards from human preferences shows promise.
The ability to understand and converse naturally with humans has the potential for the biggest positive impact by AI on civilization. Most obvious are applications in education, elderly care, and more, as well as making technology more accessible to more people. I believe that our research on conversational agents will eventually show impact in all these areas down the road.
It is exciting to see the progress in deep learning based computer vision, and I think a lot of industries are going to see a shift in the next few years due to this, for example medical imaging. Robotics is the next frontier and new startups have formed recently to tackle this space, so it will be intriguing to see their progress. Language is in many ways tougher to crack than vision or control due to the density of concepts implicitly conveyed in a few words and the inherent ambiguities in the ways we use language. But who knows, we might be short of only a couple of breakthroughs to get much closer to having human-level conversational AI.