Fair Data Generation using Language Models with Hard Constraints

SK Mainul Islam; Abhinav Nagpal; Balaji Ganesan; Pranay Kumar Lohia

NeurIPS 2021

Workshop paper

06 Dec 2021

Fair Data Generation using Language Models with Hard Constraints

Abstract

Natural language text generation has seen significant improvements with the advent of pre-trained language models. Using such language models to predict personal data entities, in place of redacted spans in text, could help generate synthetic datasets. In order to address privacy and ethical concerns with such datasets, we need to ensure that the masked entity predictions are also fair and controlled by application specific constraints. We introduce new ways to inject hard constraints and knowledge into the language models that address such concerns and also improve performance on this task.

Conference paper