When data lie: Fairness and robustness in contested environments
Abstract
Many important decisions historically made by humans are now being made by algorithms - often learnt from data - whose accountability measures and legal standards are far from satisfactory. While model transparency is important, it is neither necessary nor sufficient. Accountability is arguably more important. However, accountability needs to carefully take into consideration the weaknesses of the original data, as well as the weaknesses of the model itself: Indeed, robust datasets enable model robustness, and vice versa. In this paper we will focus on unfair datasets, as an example of the weaknesses in datasets. Fairness directly involves privacy problems, since learning without fairness can emphasize certain features or directions that generate private information leakage. For instance, a model may inadvertently reveal a persons age if age is a discriminating feature in a models decision making. Moreover, we will investigate the robustness of model in presence of adversarial activities. Indeed, we should strengthen our models by estimating what an adversary will do based on continuous dynamic learning, mindful of concealment and deception, and with a clear, explainable, insightful summary for the final decision makers. In this paper we will discuss how models based on unfair datasets can hardly be robust; and datasets used by weak models can hardly be fair.