Zhengxu Xia, Yitian Hao, et al.
MIDDLEWARE 2023
Limited data access is a longstanding barrier to data-driven re- search and development in the networked systems community. In this work, we explore if and how generative adversarial net- works (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate mea- surements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fi- delity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., band- width measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical no- tions of privacy and recent advances to improve the privacy prop- erties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.
Zhengxu Xia, Yitian Hao, et al.
MIDDLEWARE 2023
Weichao Mao, Haoran Qiu, et al.
NeurIPS 2023
Anna Maria Nestorov, Josep Berral, et al.
Middleware 2022
Pol G. Recasens, Ferran Agullo, et al.
CLOUD 2025