TY - JOUR
T1 - M22
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
AU - Liu, Yangyi
AU - Salehkalaibar, Sadaf
AU - Rini, Stefano
AU - Chen, Jun
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In federated learning (FL), the communication constraint between the remote users and the Parameter Server (PS) is a crucial bottleneck. This paper proposes M22, a rate-distortion inspired approach to model update compression for distributed training of deep neural networks (DNNs). In particular, (i) we propose a family of distortion measures referred to as "M-magnitude weighted L2"norm, and (ii) we assume that gradient updates follow an i.i.d. distribution with two degrees of freedom - generalized normal and Weibull distributions. To measure the gradient compression performance under a communication constraint, we define the per-bit accuracy as the optimal improvement in accuracy that a bit of communication brings to the centralized model over the training period. Using this performance measure, we systematically benchmark the choice of gradient distributions and the distortion measure. We provide substantial insights on the role of these choices and argue that significant performance improvements can be attained using such a rate-distortion inspired compressor.
AB - In federated learning (FL), the communication constraint between the remote users and the Parameter Server (PS) is a crucial bottleneck. This paper proposes M22, a rate-distortion inspired approach to model update compression for distributed training of deep neural networks (DNNs). In particular, (i) we propose a family of distortion measures referred to as "M-magnitude weighted L2"norm, and (ii) we assume that gradient updates follow an i.i.d. distribution with two degrees of freedom - generalized normal and Weibull distributions. To measure the gradient compression performance under a communication constraint, we define the per-bit accuracy as the optimal improvement in accuracy that a bit of communication brings to the centralized model over the training period. Using this performance measure, we systematically benchmark the choice of gradient distributions and the distortion measure. We provide substantial insights on the role of these choices and argue that significant performance improvements can be attained using such a rate-distortion inspired compressor.
KW - DNN gradient modeling
KW - Federated learning
KW - Gradient compression
KW - Gradient sparsification
UR - http://www.scopus.com/inward/record.url?scp=85180404092&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10097231
DO - 10.1109/ICASSP49357.2023.10097231
M3 - Conference article
AN - SCOPUS:85180404092
SN - 1520-6149
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Y2 - 4 June 2023 through 10 June 2023
ER -