Decision Transformers: Reinforcement Learning via Sequence Modeling
In this presentation, I will explain how the Reinforcement Learning (RL) problem can be cast as a simple sequence modeling problem and that would imply that we can leverage the simplicity and scalability of the Transformer and associated advances such as GPT-x to design an architecture for RL called the Decision Transformer which just treats RL as conditional sequence modeling, without fitting value functions or computing policy gradients like prior approaches.
Aravind is a Research Scientist at OpenAI where he works on large generative models. He completed his PhD from UC Berkeley where he was advised by Prof. Pieter Abbeel and made contributions to contrastive learning, transformers and generative models for reinforcement learning, generative models and computer vision. Aravind has spent time at Google DeepMind, Google Brain and OpenAI during his PhD, and co-taught the Berkeley Deep Unsupervised Learning Classes.