Abstract
The use of differentially private synthetic data has been adopted as a common security measure for the public release of sensitive data. However, the existing solutions either suffer from serious privacy budget splitting or fail to fully automate the generation procedures. In this study, we propose an automated system for synthesizing differentially private synthetic tabular data, called DPView. Our key insight is that high-dimensional data synthesis can be accomplished by utilizing the domain sizes of attributes, which are public information, whereas identifying the correlation among attributes is necessary but leads to severe privacy budget splitting. In addition, we analytically optimize both the privacy budget allocation and consistency procedures of the proposed method through mathematical programming. We further propose two novel methods, including iterative non-negativity and consistency-aware normalization, to postprocess the synthetic data. An extensive set of experimental results demonstrates the superior utility of DPView.
Original language | English |
---|---|
Pages (from-to) | 15886-15900 |
Number of pages | 15 |
Journal | IEEE Internet of Things Journal |
Volume | 9 |
Issue number | 17 |
DOIs | |
State | Published - 1 Sep 2022 |
Keywords
- Differential privacy (DP)
- synthetic data set