Semi-supervised Feature Selection for Efficient Detection of Systemic Deviations to Develop Trustworthy AI
Abstract
Trustworthy AI aims to achieve systems that support decision-making with data-driven insights while satisfying fundamental requirements such as explainability and fairness. Identifying systemic deviations in datasets and model outputs helps to validate fairness issues, such as bias to a certain subgroup. Multiple techniques have been proposed in the state-of-the-art to detect systemic deviations, but computational complexity grows with the dimension of the feature space. Thus, feature selection could be employed for efficient detection process. However, existing feature selection techniques are often conducted by optimizing the performance of prediction outcomes rather than systemic deviations. In this paper, we propose a sparsity-based and automated feature selection (SAFS) framework for efficient discovery of anomalous patterns, by encoding systemic outcome deviations via the sparsity of feature-driven odds ratios, without a supervised-training of a particular model. SAFS achieves more than 3x reduction in computation time while maintaining detection performance, using just half of the original feature space, when validated on a publicly available critical care dataset. SAFS also results in superior performance when compared against multiple baselines for feature selection.