TY - GEN
T1 - A Hybrid Layered Image Compressor with Deep-Learning Technique
AU - Lee, Wei Cheng
AU - Chang, Chih Peng
AU - Peng, Wen-Hsiao
AU - Hang, Hsueh-Ming
N1 - Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2020/9/21
Y1 - 2020/9/21
N2 - This paper presents a detailed description of NCTU's proposal for learning-based image compression, in response to the JPEG AI Call for Evidence Challenge. The proposed compression system features a VVC intra codec as the base layer and a learning-based residual codec as the enhancement layer. The latter aims to refine the quality of the base layer via sending a latent residual signal. In particular, a base-layer-guided attention module is employed to focus the residual extraction on critical high-frequency areas. To reconstruct the image, this latent residual signal is combined with the base-layer output in a non-linear fashion by a neural-network-based synthesizer. The proposed method shows comparable rate-distortion performance to single-layer VVC intra in terms of common objective metrics, but presents better subjective quality particularly at high compression ratios in some cases. It consistently outperforms HEVC intra, JPEG 2000, and JPEG. The proposed system incurs 18M network parameters in 16-bit floating-point format. On average, the encoding of an image on Intel Xeon Gold 6154 takes about 13.5 minutes, with the VVC base layer dominating the encoding runtime. On the contrary, the decoding is dominated by the residual decoder and the synthesizer, requiring 31 seconds per image.
AB - This paper presents a detailed description of NCTU's proposal for learning-based image compression, in response to the JPEG AI Call for Evidence Challenge. The proposed compression system features a VVC intra codec as the base layer and a learning-based residual codec as the enhancement layer. The latter aims to refine the quality of the base layer via sending a latent residual signal. In particular, a base-layer-guided attention module is employed to focus the residual extraction on critical high-frequency areas. To reconstruct the image, this latent residual signal is combined with the base-layer output in a non-linear fashion by a neural-network-based synthesizer. The proposed method shows comparable rate-distortion performance to single-layer VVC intra in terms of common objective metrics, but presents better subjective quality particularly at high compression ratios in some cases. It consistently outperforms HEVC intra, JPEG 2000, and JPEG. The proposed system incurs 18M network parameters in 16-bit floating-point format. On average, the encoding of an image on Intel Xeon Gold 6154 takes about 13.5 minutes, with the VVC base layer dominating the encoding runtime. On the contrary, the decoding is dominated by the residual decoder and the synthesizer, requiring 31 seconds per image.
KW - hybrid-based layered coding
KW - learned image compression
KW - residual coding
KW - variable rate
UR - http://www.scopus.com/inward/record.url?scp=85099173576&partnerID=8YFLogxK
U2 - 10.1109/MMSP48831.2020.9287130
DO - 10.1109/MMSP48831.2020.9287130
M3 - Conference contribution
AN - SCOPUS:85099173576
T3 - IEEE 22nd International Workshop on Multimedia Signal Processing, MMSP 2020
BT - IEEE 22nd International Workshop on Multimedia Signal Processing, MMSP 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE International Workshop on Multimedia Signal Processing, MMSP 2020
Y2 - 21 September 2020 through 24 September 2020
ER -