Understanding the Limits of Explainability: Challenges and Opportunities
As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this talk, I will demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, I will discuss a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using results from real world datasets (including COMPAS), I will demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases. I will also discuss extensive user studies that we carried out with domain experts in law to understand the perils of such misleading explanations and how they can be used to manipulate user trust. I will conclude the talk by discussing how novel methods inspired from adversarial robustness literature can be used to address some of the aforementioned vulnerabilities of explanation methods.
Hima Lakkaraju is an Assistant Professor at Harvard University. She recently graduated with a PhD in Computer Science from Stanford University. Her research interests lie within the broad area of trustworthy machine learning. More specifically, her research spans explainable & fair ML, adversarial robustness, reinforcement learning, and causal inference. Her work finds applications in high-stakes settings such as criminal justice, healthcare, public policy, and education. Hima has recently been named one of the 35 innovators under 35 by MIT Tech Review, and was featured as one of the innovators to watch by Vanity Fair. She has received several prestigious awards including the best paper awards at SIAM International Conference on Data Mining (SDM) and INFORMS. Her research has also been covered by various popular media outlets including the New York Times, MIT Tech Review, Harvard Business Review, TIME, Forbes, Business Insider, and Bloomberg. For more information, please visit: https://himalakkaraju.github.io/