• 08:00

    REGISTRATION OPENS

  • 09:00
    Ban Kawas

    WELCOME NOTE & OPENING REMARKS

    Ban Kawas - Senior Research Scientist - Reinforcement Learning - Meta

    Down arrow blue

    Ban is a Senior AI Research Scientist at Meta. She is working on democratizing Reinforcement Learning and enabling its use in the real world, spanning several application areas from compiler optimization to embodied AI. Ban and her team are developing ReAgent; an end-to-end platform for applied RL, checkout open source version at https://reagent.ai/

    Linkedin
  • REINFORCEMENT LEARNING ADVANCEMENTS

  • 09:15
    Deepak Pathak

    Learning to Walk with Rapid Motor Adaptation

    Deepak Pathak - Assistant Professor - Carnegie Mellon University

    Down arrow blue

    Learning to Walk with Rapid Motor Adaptation

    How can we train a robot that can generalize to thousands of unseen environments? This question underscores the holy grail of AI research dominated by learning from demonstrations or rewards once in diverse scenarios. However, both of these paradigms fall short because it is difficult to supervise an agent for all possible situations it can encounter in the future. We posit that generalization is truly only possible if the robot can continually and rapidly adapt itself to new situations. This adaptation has to occur online, at a time scale of fractions of a second, which implies that we have no time to carry out multiple experiments in the physical world. In this talk, I will describe a formulation for what we call Rapid Motor Adaptation (RMA) through a case study of legged robots. Legged locomotion is commonly studied and programmed as a discrete set of structured gait patterns, like walk, trot, gallop. However, studies of children learning to walk (Adolph et al) show that real-world locomotion is often quite unstructured and more like "bouts of intermittent steps". We have developed a general approach to walking which is built on learning on varied terrains in simulation followed by rapid online adaptation in the real world. I will show how this setup naturally leads to animal-like gaits in our robot and can be tightly coupled with goal-driven navigation.

    Deepak Pathak is a faculty in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. from UC Berkeley and his research spans computer vision, machine learning, and robotics. He is a recipient of the faculty awards from Google, Sony, GoodAI, and graduate fellowship awards from Facebook, NVIDIA, Snapchat. His research has been featured in popular press outlets, including The Economist, The Wall Street Journal, Quanta Magazine, Washington Post, CNET, Wired, and MIT Technology Review among others. Deepak received his Bachelor's from IIT Kanpur with a Gold Medal in Computer Science. He co-founded VisageMap Inc. later acquired by FaceFirst Inc.

    Twitter Linkedin
  • 09:35
    Aravind Srinivas

    Decision Transformers: Reinforcement Learning via Sequence Modeling

    Aravind Srinivas - Research Scientist - OpenAI

    Down arrow blue

    Decision Transformers: Reinforcement Learning via Sequence Modeling

    In this presentation, I will explain how the Reinforcement Learning (RL) problem can be cast as a simple sequence modeling problem and that would imply that we can leverage the simplicity and scalability of the Transformer and associated advances such as GPT-x to design an architecture for RL called the Decision Transformer which just treats RL as conditional sequence modeling, without fitting value functions or computing policy gradients like prior approaches.

    Aravind is a Research Scientist at OpenAI where he works on large generative models. He completed his PhD from UC Berkeley where he was advised by Prof. Pieter Abbeel and made contributions to contrastive learning, transformers and generative models for reinforcement learning, generative models and computer vision. Aravind has spent time at Google DeepMind, Google Brain and OpenAI during his PhD, and co-taught the Berkeley Deep Unsupervised Learning Classes.

    Twitter Linkedin
  • 10:00
    Amir Meisami

    A Quick Overview of Causality in Online Learning Problems

    Amir Meisami - Senior Machine Learning Scientist - Adobe

    Down arrow blue

    A Quick Overview of Causality in Online Learning Problems

    In this presentation, we discuss the importance of causation in multi-arm bandits and MDPs. We cover recent developments in this area including some of our projects at Adobe.

    Amir Meisami is a Senior Machine Learning Research Scientist at Adobe. Over his tenure at Adobe, he has been working on a variety of sequential learning methodologies to handle challenging problems in digital marketing. More recently he has been focusing on the notion of causality in online and offline settings.

    Linkedin
  • 10:20

    COFFEE & NETWORKING BREAK

  • IMPROVING REINFORCEMENT LEARNING

  • 10:55
    Danny Lange

    Learning from Multi-Agent, Emergent Behaviors in a Simulated Environment

    Danny Lange - SVP of AI & Machine Learning - Unity Technologies

    Down arrow blue

    Learning from Multi-Agent, Emergent Behaviors in a Simulated Environment

    A revolution in reinforcement learning is happening, one that helps companies harness the more diverse, complex, virtual simulations available to accelerate the pace of innovation. Join this session to learn about particular environments already created that have yielded surprising advances in AI agents, and to better understand how emergent behaviors and open-endedness in multi-agent systems can lead to the most optimal designs and real-world practices.

    Dr. Danny Lange is Senior Vice President of Artificial Intelligence and Machine Learning at Unity. As head of machine learning at Unity, Lange leads the company’s innovation around AI (Artificial Intelligence) and Machine Learning, focusing on bringing AI to simulation and gaming.

    Prior to joining Unity, Lange was the head of machine learning at Uber, where he led efforts to build the world’s most versatile Machine Learning platform to support the company’s hyper-growth. Lange also served as General Manager of Amazon Machine Learning -- an AWS product that offers Machine Learning as a Cloud Service. Before that, he was Principal Development Manager at Microsoft where he led a product team focused on large-scale Machine Learning for Big Data.

    Lange spent 8 years on Speech Recognition Systems, first as CTO of General Magic, Inc., then through his work on General Motor’s OnStar Virtual Advisor, one of the largest deployments of an intelligent personal assistant until Siri. Danny started his career as a Computer Scientist at IBM Research.

    He holds MS and Ph.D. degrees in Computer Science from the Technical University of Denmark. He is a member of the Association for Computer Machinery (ACM) and IEEE Computer Society, and has several patents to his credit.

    Twitter Linkedin
  • 11:25
    Sergey Levine

    Generalization and the Role of Data in Reinforcement Learning

    Sergey Levine - Assistant Professor - UC Berkeley

    Down arrow blue

    Generalization and the Role of Data in Reinforcement Learning

    Over the past decade, we have witnessed a revolution in supervised machine learning, as large, high-capacity models trained on huge datasets attain amazing results across a range of domains, from computer vision to natural language processing and speech recognition. But can these gains in supervised learning performance translate into more effective and optimal decision making? The branch of machine learning research that studies decision making is called reinforcement learning, and while more effective and performant reinforcement learning methods have also been developed over the past decade, in general it has proven challenging for reinforcement learning to benefit from large datasets, because it is conventionally thought of as an active online learning framework, which makes reusing large previously collected datasets difficult. In this talk, I will discuss how reinforcement learning algorithms can enable broad generalization through the use of large and diverse prior datasets. This concept lies at the core of offline reinforcement learning, which addresses the development of reinforcement learning methods that do not require active interaction with their environment but instead, much like current supervised learning methods, learn from previously collected datasets. Crucially, unlike supervised learning, such methods directly optimize for optimal downstream decision making, maximizing long-horizon reward signals. I will describe the computational and statistical challenges associated with offline reinforcement learning, describe recent algorithmic developments, and present a few promising applications.

    Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as computer vision and graphics. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more.

    Twitter Linkedin
  • 12:05
    Ban Kawas

    Reinforcement Learning in the Real World

    Ban Kawas - Senior Research Scientist - Reinforcement Learning - Meta

    Down arrow blue

    Ban is a Senior AI Research Scientist at Meta. She is working on democratizing Reinforcement Learning and enabling its use in the real world, spanning several application areas from compiler optimization to embodied AI. Ban and her team are developing ReAgent; an end-to-end platform for applied RL, checkout open source version at https://reagent.ai/

    Linkedin
  • 12:25

    LUNCH

  • 13:30
    Peter Henderson

    Reinforcement Learning and Public Policy

    Peter Henderson - PhD Student - Stanford University

    Down arrow blue

    Reinforcement Learning and Public Policy

    Reinforcement learning is getting to a state where it is being considered for use within public policy and government services. In this talk, we examine the public sector use-cases where reinforcement learning is being used (or could be used in the near future). We evaluate the challenges and risks of these deployments, suggesting paths forward for determining if RL can or should be used in particular public sector deployments.

    Peter Henderson is currently a PhD Student at Stanford University. His research has ranged across various topics in reinforcement learning and natural language processing, often diving into best practices and methods for experimental research. Previously, he received a Masters in Computer Science at McGill University under the supervision of David Meger and Joelle Pineau. In industry, he has worked as Software Engineer at Amazon AWS and an Applied Scientist at Amazon Alexa.

    Twitter Linkedin
  • REINFORCEMENT LEARNING APPLICATIONS

  • 13:50
    Aleksandra Faust

    Toward Scalable Autonomy

    Aleksandra Faust - Senior Staff Research Scientist & RL Research Team Co-Founder - Google Brain

    Down arrow blue

    Toward Scalable Autonomy

    Reinforcement learning is a promising technique for training autonomous systems that perform complex tasks in the real world. However, training reinforcement learning agents is difficult and tedious, requiring heavy engineering and often resulting in suboptimal results. In fact, we can formulate the interaction between the human engineer and the agent under training as a decision-making process that the human agent performs, and consequently automate the training by learning a decision making policy. In this talk we will cover several examples that illustrate the process, learning intrinsic rewards, RL loss functions, and curriculum for continual learning. We show that across different applications, learning to learn methods improve reinforcement learning agents generalization and performance, and raise questions about nurture vs nature in training autonomous systems.

    Aleksandra Faust is a Senior Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research. Previously, Aleksandra founded and led Task and Motion Planning research in Robotics at Google, machine learning for self-driving car planning and controls in Waymo, and was a senior researcher in Sandia National Laboratories. She earned a Ph.D. in Computer Science at the University of New Mexico (with distinction), and a Master's in Computer Science from the University of Illinois at Urbana-Champaign. Her research interests include learning for safe and scalable reinforcement learning, learning to learn, motion planning, decision-making, and robot behavior. Aleksandra won IEEE RAS Early Career Award for Industry, the Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in the period of 2011-2014, and was named Distinguished Alumna by the University of New Mexico School of Engineering. Her work has been featured in the New York Times, PC Magazine, ZdNet, VentureBeat, and was awarded Best Paper in Service Robotics at ICRA 2018, Best Paper in Reinforcement Learning for Real Life (RL4RL) at ICML 2019, and Best Paper of IEEE Computer Architecture Letters in 2020.

    Twitter Linkedin
  • 14:10
    Stephan Zheng

    AI-driven Economics using the AI Economist and WarpDrive

    Stephan Zheng - Lead Research Scientist - Salesforce Research

    Down arrow blue

    AI-driven Economics using the AI Economist and WarpDrive

    Solving global challenges, such as economic inequality and sustainability, requires new tools and data to design effective economic policies. The AI Economist is a reinforcement learning (RL) framework that outperforms and overcomes key limitations of traditional policy design methods. I will survey key results and systems that move this towards real world scale: 1) AI tax policies can significantly improve equality and productivity, 2) AI policies improve health and economic outcomes in simulated pandemics, 3) extensions to consumer-firm economies, more human-like agents, and AI pricing in platform businesses, and 4) WarpDrive, our open-source GPU framework for superfast multi-agent RL,

    Stephan Zheng (www.stephanzheng.com) leads the AI Economist team at Salesforce Research, which works on deep reinforcement learning and AI simulations to design economic policy. His work has been widely covered in the media, including the Financial Times, Axios, Forbes, Zeit, Volkskrant, MIT Tech Review, and others. He holds a Ph.D. in Physics from Caltech (2018) and interned with Google Research and Google Brain. Before machine learning, he studied mathematics and theoretical physics at the University of Cambridge, Harvard University, and Utrecht University. He received the Dutch Lorenz graduation prize for his thesis on topological string theory and was twice awarded the Dutch national Huygens scholarship.

    Twitter Linkedin
  • 14:30
    Ruben Glatt

    Deep Symbolic Optimization (DSO) – A Reinforcement Learning-based Framework for Combinatorial Optimization

    Ruben Glatt - Machine Learning Researcher - Lawrence Livermore National Laboratory

    Down arrow blue

    Deep Symbolic Optimization (DSO) – A Reinforcement Learning-based Framework for Combinatorial Optimization

    Deep Learning and Deep Reinforcement Learning have proven successful for many difficult regression and control problems by learning models represented by neural networks. However, the complexity of neural network-based models, involving thousands of composed nonlinear operators, can render them problematic to understand, trust, and deploy. In contrast, simple tractable symbolic expressions can facilitate human understanding, while also being transparent and exhibiting predictable behavior.

    In this talk, we show how the DSO framework leverages deep learning for symbolic optimization via a simple idea: use a large model to search the space of small models. Specifically, we use an autoregressive recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the model to generate higher-performing objects. This framework can be applied to optimize hierarchical, variable-length objects under a black box performance metric, with the ability to incorporate constraints in situ, reducing the search space as we limit exploration based on established knowledge.

    Ruben is a Machine Learning Researcher at the Lawrence Livermore National Laboratory (LLNL). With a background in Mechatronics and Mechanical Engineering, he has turned to Artificial Intelligence where his main interest lies in Machine Learningresearch with a focus on Reinforcement Learning, autonomous systems, and applications in energy efficiency. He received his Ph.D. in Computer Engineering in the area of ML at the University of São Paulo (USP), Brazil, holds a master degree in Mechanical Engineering in the area of controlling mechanical systems from the Universidade Estadual Paulista Júlio de Mesquita Filho(UNESP), Brazil, and a Diplom-Ingenieur degree in Mechatronics in the area of sensors and robotics from the Karlsruhe Institute of Technology (KIT), Germany. Ruben has acquired years of professional experience before and during his studies while working in the technology and energy sector, as well as in the organization of international ML conferences. After converting from a postdoctoral position at the Lawrence Livermore National Laboratory (LLNL), USA, he is now working as a Machine Learning Researcher on a variety of RL projects to develop methods for collaborative autonomy in multi-agent systems, interpretable RL, and real-world applications. Ruben’s long-term research interest lies in successfully applying RL techniques to real-world challenges to accelerate and improve decision-making, autonomously or as a support tool for humans, preferably for applications in energy and smart mobility systems.

    Twitter Linkedin
  • 14:50

    COFFEE & NETWORKING BREAK

  • 15:20
    Agrim Gupta

    Towards Understanding and Building Embodied Intelligence

    Agrim Gupta - PhD Student - Stanford Vision & Learning Lab

    Down arrow blue

    Towards Understanding and Building Embodied Intelligence

    In contrast to embodied intelligence, which is common in nature, the recent progress in AI has been disembodied. Animals display remarkable degrees of embodied intelligence by leveraging their evolved morphologies to learn complex tasks. In this talk, I will argue that intelligent behavior is a function of the brain, morphology, and the environment. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning. To address this, I will introduce a new framework called DERL which enables us to evolve agents with diverse morphologies to learn hard locomotion and manipulation tasks in complex environments, and reveals insights into relations between environmental physics, embodied intelligence, and the evolution of rapid learning.

    Agrim Gupta is a third-year PhD student in Computer Science at Stanford, advised by Fei-Fei Li and part of the Stanford Vision and Learning Lab. Working at the intersection of machine learning, computer vision and robotics his research focuses on understanding and building embodied agents. His research has been covered by popular media outlets like The Economist, TechCrunch, VentureBeat and MIT Technology Review. Previously, he was a Research Engineer at Facebook AI Research where he worked on building datasets and algorithms for long tailed object recognition.

    Twitter Linkedin
  • 15:40
    Daniel Wu

    XRL - eXplainable Reinforcement Learning in a Nutshell

    Daniel Wu - Head of Commercial Banking AI and Machine Learning - J.P. Morgan Chase

    Down arrow blue

    XRL - eXplainable Reinforcement Learning in a Nutshell

    Thanks to the encouraging advancement in machine learning technology, AI has become more ubiquitous in our daily lives. From task automation, decision making, cost optimization, human augmentation, medical diagnosis to autonomous driving and robotics, AI is realizing its promise to greatly enhance efficiency in modern life. Given our increasing dependency on AI systems, it is paramount to ensure AI development adheres to the principles of responsible AI. Explainability is one such foundational principle of responsible AI. In the reinforcement learning setting where intelligent agents learn by themselves with little human intervention, explainability is even more important in establishing trust and confidence with the users. This talk aims to provide an overview of XRL and a brief survey of XRL techniques.

    Daniel Wu is a technical leader who brings more than 20 years of expertise in software engineering, AI/ML, and high-impact team development. He is the Head of Commercial Banking AI and Machine Learning at JPMorgan Chase where he drives financial service transformation through AI innovation. His diverse professional background also includes building point of care expert systems for physicians to improve quality of care, co-founding an online personal finance marketplace, and building an online real estate brokerage platform.

    Daniel is passionate about the democratization of technology and the ethical use of AI - a philosophy he shares in the computer science and AI/ML education programs he has contributed to over the years. He holds a computer science degree from Stanford University.

    Linkedin
  • 16:00
    Joel Lehman

    The Surprising Creativity of Evolutionary RL

    Joel Lehman - Research Scientist - OpenAI

    Down arrow blue

    The Surprising Creativity of Evolutionary RL

    One facet of the practical art of reinforcement learning is how to take a desired task and construct a reward function -- one that results in an acceptable solution when optimized. A common failure mode is that a sensible-seeming reward function can often be optimized in surprising (and undesirable, but often funny) ways -- like a devious genie fulfilling the letter of your request but undermining the spirit of it. This talk reviews a set of examples taken from the evolutionary reinforcement learning community that highlights how common this phenomenon is among researchers and practitioners. The aim is to draw attention to the practical challenge this presents for reinforcement learning and how practitioners often overcome these challenges. The conclusion of the talk describes ways the research community is exploring new paradigms for more easily specifying success criteria for complex RL tasks.

    Joel Lehman is a research scientist at OpenAI, and previously was a founding member of Uber AI Labs and an assistant professor at the IT University of Copenhagen. His research focuses on open-endedness, reinforcement learning, and AI safety. His PhD dissertation introduced the novelty search algorithm, which inspired a popular science book co-written with Ken Stanley on what search algorithms imply for individual and societal objectives, called “Why Greatness Cannot Be Planned.”

    Twitter Linkedin
  • 16:20
    Alexander Pan

    The Effects of Reward Misspecification

    Alexander Pan - AI Alignment Research - Caltech

    Down arrow blue

    The Effects of Reward Misspecification

    Reward hacking, where RL agents exploit gaps in misspecified reward functions, has been widely observed, but not yet systematically studied. To understand how reward hacking arises, we construct four RL environments with misspecified rewards. We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time. More capable agents often exploit reward misspecifications, achieving higher proxy reward and lower true reward than less capable agents. Moreover, we find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward. Such phase transitions pose challenges to monitoring the safety of ML systems. To address this, we propose an anomaly detection task for aberrant policies and offer several baseline detectors.

    Alexander Pan is a final-year undergraduate at Caltech studying mathematics and computer science. His research focuses on aligning AI systems with human values through reinforcement learning and natural language processing. In the fall, he plans to study computer science in graduate school.

    Linkedin
  • REINFORCEMENT LEARNING FOR ROBOTICS

  • 16:40
    Lisa Lee

    Learning Embodied Agents with Scalably-Supervised Reinforcement Learning

    Lisa Lee - Research Scientist - Google Brain

    Down arrow blue

    Learning Embodied Agents with Scalably-Supervised Reinforcement Learning

    Reinforcement learning (RL) agents learn to perform a task through trial-and-error interactions with an initially unknown environment. Despite the recent progress in deep RL, several unsolved challenges limit the applicability of RL to real-world tasks, including efficient exploration in high-dimensional spaces, learning and data efficiency, and the high cost of human supervision. Towards solving these challenges, this talk focuses on how we can balance self-supervised and human-supervised RL to efficiently train an agent for solving various visual robotic tasks. We address the following questions:

    1. How can we amortize the cost of learning to explore?

    2. How can we learn a semantically meaningful representation for faster exploration and learning?

    3. How can we utilize language to equip deep RL agents with structured priors about the physical world, and enable generalization and knowledge transfer across different tasks?

    Lisa Lee is a Research Scientist at Google Brain in the Reinforcement Learning team. She obtained her PhD in Machine Learning from Carnegie Mellon, where she was advised by Ruslan Salakhutdinov and Eric Xing. She graduated summa cum laude with an A.B. in Mathematics from Princeton University, where her undergraduate thesis on word embeddings was advised by Sanjeev Arora. Lisa's research focuses on deep reinforcement learning for robotic control, and training embodied agents that can be deployed in complex environments to solve a wide variety of tasks. She is particularly excited about representation learning for RL; efficient exploration and fast adaptation in multi-task RL; skill learning and planning; and utilizing language to equip visual RL agents with structured priors about the world.

    Twitter Linkedin
  • 17:00

    NETWORKING RECEPTION

  • 18:00

    END OF DAY 1

  • THIS SCHEDULE TAKES PLACE ON DAY 1

This website uses cookies to ensure you get the best experience. Learn more