Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems

Cheng Ji, Fan Wu, Zongwei Zhu*, Li Pin Chang, Huanghe Liu, Wenjie Zhai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Pattern recognition applications such as face recognition and agricultural product detection have drawn a rapid interest on Cyber–Physical–Social-Systems (CPSS). These CPSS applications rely on the deep neural networks (DNN) to conduct the image classification. However, traditional DNN inference models in the cloud could suffer from network delay fluctuations and privacy leakage problems. In this regard, current real-time CPSS applications are preferred to be deployed on edge-end embedded devices. Constrained by the computing power and memory limitations of edge devices, improving the memory management efficacy is the key to improving the quality of service for model inference. First, this study explored the incremental loading strategy of model weights for the model inference. Second, the memory space at runtime is optimized through data layout reorganization from the spatial dimension. In particular, the proposed schemes are orthogonal to existing models. Experimental results demonstrate that the proposed approach reduced the memory consumption by 61.05% without additional inference time overhead.

Original languageEnglish
Article number102183
JournalJournal of Systems Architecture
Volume118
DOIs
StatePublished - Sep 2021

Keywords

  • Cyber–physical–social systems
  • Edge computing
  • Image classification
  • Machine learning

Fingerprint

Dive into the research topics of 'Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems'. Together they form a unique fingerprint.

Cite this