Model Agnostic Contrastive Explanations for Classification Models

Amit Dhurandhar; Tejaswini Pedapati; Avinash Balakrishnan; Pin-Yu Chen; Karthikeyan Shanmugam; Ruchir Puri

doi:10.1109/JETCAS.2024.3486114

IEEE JESTCS

Paper

01 Jan 2024

Model Agnostic Contrastive Explanations for Classification Models

View publication

Abstract

Extensive surveys on explanations that are suitable for humans, claims that an explanation being contrastive is one of its most important traits. A few methods have been proposed to generate contrastive explanations for differentiable models such as deep neural networks, where one has complete access to the model. In this work, we propose a method, Model Agnostic Contrastive Explanations Method (MACEM), that can generate contrastive explanations for any classification model where one is able to only query the class probabilities for a desired input. This allows us to generate contrastive explanations for not only neural networks, but also models such as random forests, boosted trees and even arbitrary ensembles that are still amongst the state-of-the-art when learning on tabular data. Our method is also applicable to the scenarios where only the black-box access of the model is provided, implying that we can only obtain the predictions and prediction probabilities. With the advent of larger models, it is increasingly prevalent to be working in the black-box scenario, where the user will not necessarily have access to the model weights or parameters, and will only be able to interact with the model using an API. As such, to obtain meaningful explanations we propose a principled and scalable approach to handle real and categorical features leading to novel formulations for computing pertinent positives and negatives that form the essence of a contrastive explanation. A detailed treatment of this nature where we focus on scalability and handle different data types was not performed in the previous work, which assumed all features to be positive real valued with zero being indicative of the least interesting value. We part with this strong implicit assumption and generalize these methods so as to be applicable across a much wider range of problem settings. We quantitatively as well as qualitatively validate our approach over public datasets covering diverse domains.

Paper