Learning Values and Policies from State Observations
Observational learning is a key component for human development that enables solving tasks by observing others perform them. For example, we might learn to cook a new dish by watching a video of it being prepared. Notably, we are capable of mirroring behavior through only the observation of state trajectories without direct access to the underlying actions (e.g., the exact kinematic forces) and intentions that yielded them. In order to be general, artificial agents should also be equipped with the ability to quickly solve problems after observing the solution. In this presentation, I will first discuss an approach for inferring values directly from state observations that can then be used to train reinforcement learning agents. Then, I will describe an approach that enables learning a latent policy directly from state observations, which can then be quickly mapped to real actions in the agent’s environment.
- Agents can be trained to imitate from only state observations, without access to actions
- We can learn values directly from observations and train reinforcement learning agents with them
- We can also learn policies from state observations and directly imitate from them
Ashley Edwards is a research scientist at Uber AI Labs and recently obtained her PhD in computer science from Georgia Tech. Her research focuses on deep reinforcement learning, imitation learning, and model-based RL problems, with an emphasis on developing general goal representations that can be used across task environments. During her time as a PhD student at Georgia Tech, she was a recipient of the NSF Graduate Research Fellowship, was a visiting researcher at Waseda University in Japan as part of the NSF Grow program, and interned at Google Brain. She received a B.S. in Computer Science from the University of Georgia in 2011.