TY - GEN

T1 - Sharp asymptotics on the compression of two-layer neural networks

AU - Amani, Mohammad Hossein

AU - Bombari, Simone

AU - Mondelli, Marco

AU - Pukdee, Rattana

AU - Rini, Stefano

N1 - Publisher Copyright:
© 2022 IEEE.

PY - 2022

Y1 - 2022

N2 - In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M <N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.

AB - In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M <N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.

UR - http://www.scopus.com/inward/record.url?scp=85144595167&partnerID=8YFLogxK

U2 - 10.1109/ITW54588.2022.9965870

DO - 10.1109/ITW54588.2022.9965870

M3 - Conference contribution

AN - SCOPUS:85144595167

T3 - 2022 IEEE Information Theory Workshop, ITW 2022

SP - 588

EP - 593

BT - 2022 IEEE Information Theory Workshop, ITW 2022

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 IEEE Information Theory Workshop, ITW 2022

Y2 - 1 November 2022 through 9 November 2022

ER -