TY - JOUR
T1 - Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems
AU - Ji, Cheng
AU - Wu, Fan
AU - Zhu, Zongwei
AU - Chang, Li Pin
AU - Liu, Huanghe
AU - Zhai, Wenjie
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/9
Y1 - 2021/9
N2 - Pattern recognition applications such as face recognition and agricultural product detection have drawn a rapid interest on Cyber–Physical–Social-Systems (CPSS). These CPSS applications rely on the deep neural networks (DNN) to conduct the image classification. However, traditional DNN inference models in the cloud could suffer from network delay fluctuations and privacy leakage problems. In this regard, current real-time CPSS applications are preferred to be deployed on edge-end embedded devices. Constrained by the computing power and memory limitations of edge devices, improving the memory management efficacy is the key to improving the quality of service for model inference. First, this study explored the incremental loading strategy of model weights for the model inference. Second, the memory space at runtime is optimized through data layout reorganization from the spatial dimension. In particular, the proposed schemes are orthogonal to existing models. Experimental results demonstrate that the proposed approach reduced the memory consumption by 61.05% without additional inference time overhead.
AB - Pattern recognition applications such as face recognition and agricultural product detection have drawn a rapid interest on Cyber–Physical–Social-Systems (CPSS). These CPSS applications rely on the deep neural networks (DNN) to conduct the image classification. However, traditional DNN inference models in the cloud could suffer from network delay fluctuations and privacy leakage problems. In this regard, current real-time CPSS applications are preferred to be deployed on edge-end embedded devices. Constrained by the computing power and memory limitations of edge devices, improving the memory management efficacy is the key to improving the quality of service for model inference. First, this study explored the incremental loading strategy of model weights for the model inference. Second, the memory space at runtime is optimized through data layout reorganization from the spatial dimension. In particular, the proposed schemes are orthogonal to existing models. Experimental results demonstrate that the proposed approach reduced the memory consumption by 61.05% without additional inference time overhead.
KW - Cyber–physical–social systems
KW - Edge computing
KW - Image classification
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85107073021&partnerID=8YFLogxK
U2 - 10.1016/j.sysarc.2021.102183
DO - 10.1016/j.sysarc.2021.102183
M3 - Article
AN - SCOPUS:85107073021
VL - 118
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
SN - 1383-7621
M1 - 102183
ER -