State-of-the-art (SOTA) super-resolution (SR) models can generate high-quality images. However, they require a large external memory bandwidth, making it impossible to implement these models on hardware. Although some work has presented different kinds of layer fusion to reduce memory traffic, they can only work on simple model architectures and only consider the feature extraction part. To solve the above issues, this article proposes structure adaptive fusion (SAF) for the feature extraction part to avoid intermediate feature map I/O. This method selects the repetitive structure as the fusion unit and fuses multiple ones to meet buffer size and memory bandwidth constraints, which can deal with different SR models. In addition, we also propose channel-aware addressing for the upscale part to avoid off-chip data transfers. The proposed methods achieve over 90% of memory traffic reduction in all tested SOTA models. Compared to the SOTA fusion method, our approach requires a 52% smaller buffer size and up to 61% lower memory bandwidth for the same number of fusion layers.
|Number of pages
|IEEE Transactions on Very Large Scale Integration (VLSI) Systems
|Published - 1 Jun 2023
- Convolutional neural networks (CNNs)
- deep-learning accelerators (DLAs)
- layer fusion
- super-resolution (SR)