Black box fairness testing of machine learning models
Abstract
Any given AI system cannot be accepted unless its trustworthiness is proven. An important characteristic of a trustworthy AI system is the absence of algorithmic bias. 'Individual discrimination' exists when a given individual different from another only in 'protected attributes' (e.g., age, gender, race, etc.) receives a different decision outcome from a given machine learning (ML) model as compared to the other individual. The current work addresses the problem of detecting the presence of individual discrimination in given ML models. Detection of individual discrimination is test-intensive in a black-box setting, which is not feasible for non-trivial systems. We propose a methodology for auto-generation of test inputs, for the task of detecting individual discrimination. Our approach combines two well-established techniques - symbolic execution and local explainability for effective test case generation. We empirically show that our approach to generate test cases is very effective as compared to the best-known benchmark systems that we examine.