Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems

Cheng Ji, Fan Wu, Zongwei Zhu*, Li Pin Chang, Huanghe Liu, Wenjie Zhai

*此作品的通信作者

研究成果: Article同行評審

15 引文 斯高帕斯(Scopus)

摘要

Pattern recognition applications such as face recognition and agricultural product detection have drawn a rapid interest on Cyber–Physical–Social-Systems (CPSS). These CPSS applications rely on the deep neural networks (DNN) to conduct the image classification. However, traditional DNN inference models in the cloud could suffer from network delay fluctuations and privacy leakage problems. In this regard, current real-time CPSS applications are preferred to be deployed on edge-end embedded devices. Constrained by the computing power and memory limitations of edge devices, improving the memory management efficacy is the key to improving the quality of service for model inference. First, this study explored the incremental loading strategy of model weights for the model inference. Second, the memory space at runtime is optimized through data layout reorganization from the spatial dimension. In particular, the proposed schemes are orthogonal to existing models. Experimental results demonstrate that the proposed approach reduced the memory consumption by 61.05% without additional inference time overhead.

原文English
文章編號102183
期刊Journal of Systems Architecture
118
DOIs
出版狀態Published - 9月 2021

指紋

深入研究「Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems」主題。共同形成了獨特的指紋。

引用此