TY - JOUR
T1 - Distributed Dual Averaging Based Data Clustering
AU - Servetnyk, Mykola
AU - Fung, Carrson C.
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2023/2/1
Y1 - 2023/2/1
N2 - Multiagent distributed clustering scheme is proposed herein to process data which are collected by dispersed sensors that are not under centralized control. Two methods based on distributed dual averaging (DDA) algorithm are proposed, which are able to incorporate network structure and do not require exchange of centroid estimates, which makes it appealing for security conscious applications. The first method provides the framework for distributed clustering using the DDA algorithm with predefined regularization parameter. The second method, called Adaptive DDA (ADDA), relaxes the condition concerning a priori knowledge about the centroids, assumed in the first method, without losing clustering performance. This is achieved by properly regularizing the problem where a data-driven approach is used to determine the regularization parameter. The proposed methods are further extended via the proposed Bin method to scenario where processing agents store unbalanced amount of data with non-IID class distribution. Experiments are conducted on both real-life and synthetic data. Numerical results show the efficacy of the proposed approaches compared to state-of-art centralized algorithm and other distributed approaches.
AB - Multiagent distributed clustering scheme is proposed herein to process data which are collected by dispersed sensors that are not under centralized control. Two methods based on distributed dual averaging (DDA) algorithm are proposed, which are able to incorporate network structure and do not require exchange of centroid estimates, which makes it appealing for security conscious applications. The first method provides the framework for distributed clustering using the DDA algorithm with predefined regularization parameter. The second method, called Adaptive DDA (ADDA), relaxes the condition concerning a priori knowledge about the centroids, assumed in the first method, without losing clustering performance. This is achieved by properly regularizing the problem where a data-driven approach is used to determine the regularization parameter. The proposed methods are further extended via the proposed Bin method to scenario where processing agents store unbalanced amount of data with non-IID class distribution. Experiments are conducted on both real-life and synthetic data. Numerical results show the efficacy of the proposed approaches compared to state-of-art centralized algorithm and other distributed approaches.
KW - Clustering algorithms
KW - distributed algorithms
KW - security conscious algorithm
KW - subgradient methods
KW - unbalanced data
UR - http://www.scopus.com/inward/record.url?scp=85124095317&partnerID=8YFLogxK
U2 - 10.1109/TBDATA.2022.3146169
DO - 10.1109/TBDATA.2022.3146169
M3 - Article
AN - SCOPUS:85124095317
SN - 2332-7790
VL - 9
SP - 372
EP - 379
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 1
ER -