DPView: Differentially Private Data Synthesis Through Domain Size Information

Chih Hsun Lin, Chia Mu Yu*, Chun Ying Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The use of differentially private synthetic data has been adopted as a common security measure for the public release of sensitive data. However, the existing solutions either suffer from serious privacy budget splitting or fail to fully automate the generation procedures. In this study, we propose an automated system for synthesizing differentially private synthetic tabular data, called DPView. Our key insight is that high-dimensional data synthesis can be accomplished by utilizing the domain sizes of attributes, which are public information, whereas identifying the correlation among attributes is necessary but leads to severe privacy budget splitting. In addition, we analytically optimize both the privacy budget allocation and consistency procedures of the proposed method through mathematical programming. We further propose two novel methods, including iterative non-negativity and consistency-aware normalization, to postprocess the synthetic data. An extensive set of experimental results demonstrates the superior utility of DPView.

Original languageEnglish
Pages (from-to)15886-15900
Number of pages15
JournalIEEE Internet of Things Journal
Volume9
Issue number17
DOIs
StatePublished - 1 Sep 2022

Keywords

  • Differential privacy (DP)
  • synthetic data set

Fingerprint

Dive into the research topics of 'DPView: Differentially Private Data Synthesis Through Domain Size Information'. Together they form a unique fingerprint.

Cite this