Salomón Wollenstein-Betech, Christian Muise, et al.
ITSC 2020
In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level \textit{concepts} (e.g., \texttt{stripes}, \texttt{black}) and then predict a task label from those concepts. In particular, we study the impact of \textit{concept interventions} (i.e.,~operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs' task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term \textit{leakage poisoning}, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce \mbox{MixCEM}, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.