Utility-guided clustering-based transaction data anonymization
Abstract
Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat by transforming transaction data prior to their release have been proposed recently, but they may incur significant utility loss due to their inability to: (i) accommodate a range of different privacy requirements that data owners often have, and (ii) guarantee that the produced data will satisfy data owners' utility requirements. To address this issue, we propose a novel clustering-based framework to anonymizing transaction data, which provides the basis for designing algorithms that better preserve data utility. Based on this framework, we develop two anonymization algorithms which explore a larger solution space than existing methods and can satisfy a wide range of privacy requirements. Additionally, the second algorithm allows the specification and enforcement of utility requirements, thereby ensuring that the anonymized data remain useful in intended tasks. Experiments with both benchmark and real medical datasets verify that our algorithms significantly outperform the current state-of-the-art algorithms in terms of data utility, while being comparable in terms of efficiency.