TY - GEN
T1 - Expectation-Maximization Estimation for Key-Value Data Randomized with Local Differential Privacy
AU - Horigome, Hikaru
AU - Kikuchi, Hiroaki
AU - Yu, Chia Mu
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - This paper studies the local differential privacy (LDP) algorithm for key-value data that are pervasive in big data analysis. One of the state-of-the-arts algorithms, PrivKV, randomizes key-value pairs with a sequence of LDP algorithms. However, most likelihood estimation fails to estimate the statistics accurately when the frequency of the data for particular rare keys is limited. To address the problem, we propose the expectation-maximization-based algorithm designed for PrivKV. Instead of estimating continuous values [ - 1, 1 ] in key-value pairs, we focus on estimating the intermediate variable that contains the encoded binary bit ∈ { 1, - 1 }. This makes the problem tractable to estimate because we have a small set of possible input values and a set of observed outputs. We conduct some experiments using some synthetic data with some known distributions, e.g., Gaussian and power-law and well-known open datasets, MoveLens and Clothing. Our experiment using synthetic data and open datasets shows the robustness of estimation with regards to the size of data and the privacy budgets. The improvement is significant and the MSE of the proposed algorithm is 602.83 × 10 - 4 (41% of PrivKVM).
AB - This paper studies the local differential privacy (LDP) algorithm for key-value data that are pervasive in big data analysis. One of the state-of-the-arts algorithms, PrivKV, randomizes key-value pairs with a sequence of LDP algorithms. However, most likelihood estimation fails to estimate the statistics accurately when the frequency of the data for particular rare keys is limited. To address the problem, we propose the expectation-maximization-based algorithm designed for PrivKV. Instead of estimating continuous values [ - 1, 1 ] in key-value pairs, we focus on estimating the intermediate variable that contains the encoded binary bit ∈ { 1, - 1 }. This makes the problem tractable to estimate because we have a small set of possible input values and a set of observed outputs. We conduct some experiments using some synthetic data with some known distributions, e.g., Gaussian and power-law and well-known open datasets, MoveLens and Clothing. Our experiment using synthetic data and open datasets shows the robustness of estimation with regards to the size of data and the privacy budgets. The improvement is significant and the MSE of the proposed algorithm is 602.83 × 10 - 4 (41% of PrivKVM).
UR - http://www.scopus.com/inward/record.url?scp=85151059674&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-28451-9_44
DO - 10.1007/978-3-031-28451-9_44
M3 - Conference contribution
AN - SCOPUS:85151059674
SN - 9783031284502
T3 - Lecture Notes in Networks and Systems
SP - 501
EP - 512
BT - Advanced Information Networking and Applications - Proceedings of the 37th International Conference on Advanced Information Networking and Applications AINA-2023
A2 - Barolli, Leonard
PB - Springer Science and Business Media Deutschland GmbH
T2 - 37th International Conference on Advanced Information Networking and Applications, AINA 2023
Y2 - 29 March 2023 through 31 March 2023
ER -