The Surprising Creativity of Evolutionary RL
One facet of the practical art of reinforcement learning is how to take a desired task and construct a reward function -- one that results in an acceptable solution when optimized. A common failure mode is that a sensible-seeming reward function can often be optimized in surprising (and undesirable, but often funny) ways -- like a devious genie fulfilling the letter of your request but undermining the spirit of it. This talk reviews a set of examples taken from the evolutionary reinforcement learning community that highlights how common this phenomenon is among researchers and practitioners. The aim is to draw attention to the practical challenge this presents for reinforcement learning and how practitioners often overcome these challenges. The conclusion of the talk describes ways the research community is exploring new paradigms for more easily specifying success criteria for complex RL tasks.
Joel Lehman is a research scientist at OpenAI, and previously was a founding member of Uber AI Labs and an assistant professor at the IT University of Copenhagen. His research focuses on open-endedness, reinforcement learning, and AI safety. His PhD dissertation introduced the novelty search algorithm, which inspired a popular science book co-written with Ken Stanley on what search algorithms imply for individual and societal objectives, called “Why Greatness Cannot Be Planned.”