How to Lead Self-Interested Agents Towards Desired Outcomes
Learning agents that interact in the same environment, play sequences of actions and aim to maximise their long term rewards may converge to equilibria that are individually and collectively suboptimal. But can we induce them to change their behaviour and thus lead them toward a desired equilibrium? Classic mechanism design shows how agents that have private information can be led by appropriately designed incentives towards desired outcomes. Inspired by this, we aim to tackle the problem of designing incentives to guide learning agents towards desired outcomes and away from naturally suboptimal ones. To achieve this, we consider stochastic games in which an agent – called central decision maker – is tasked with designing incentives for other agents. We propose a framework that models the central decision maker's problem as a Markov decision process where its action affects the other agents' environment, and its reward function depends on the decisions they take within it. This framework – together with mechanism design, reinforcement learning and deep neural networks – can be used to tackle a broad set of problems (e.g. principal-agent, leader-follower, curriculum formation, agent coordination and fairness) in multi-agent systems with stochastic environments.
Sofia Ceppi is a Senior Machine Learning Researcher at PROWLER.io, where she is part of the multi-agent systems team. Her work aims to combine the advantages of mechanism design with reinforcement learning techniques. She holds a PhD in Information Technology Engineering from Politecnico di Milano and in the past years she has been visiting student at the University of Southampton, post-doc at Microsoft Research, and research associate at the University of Edinburgh. She started her research career focusing on the problem of designing incentives for competing rational agents and them move to scenarios in which agents behave more like humans.