Towards a theory of sample efficient Reinforcement Learning with rich observations.
How can we tractably solve sequential decision making problems where the learning agent receives rich observations? We will summarize a set of recent results in this direction which study a family of RL problems called Contextual Decision Processes (CDPs). CDPs generalize MDPs and POMDPs, and describe a fairly general set of sequential decision making problems, so that any sample-efficient method in this model has broad applicability. We will discuss different structural properties which enable sample-efficient model-free as well as model-based techniques. We will primarily focus on an algorithm which is both computationally practical and theoretically sound, which is appearing at ICML 2019. The talk will also familiarize the audience with the broader research activities related to reinforcement learning in Microsoft Research AI.
This talk is based on joint works with several collaborators and based on the papers: https://arxiv.org/abs/1610.09512, https://arxiv.org/abs/1803.00606, https://arxiv.org/abs/1811.08540 and https://arxiv.org/abs/1901.09018.
Alekh Agarwal is a Senior Researcher in Microsoft Research AI, where he leads the reinforcement learning group. Prior to joining Microsoft Research AI, Alekh obtained his PhD from UC Berkeley and then spent six years in the NYC lab of MSR. His research focuses on several aspects of interactive learning including reinforcement learning, contextual bandits and online learning. He has also worked extensively in stochastic and distributed optimization, and received the best paper award at NeurIPS 2015.