Learning-Based Scalable Video Coding with Spatial and Temporal Prediction

Martin Benjak*, Yi Hsin Chen, Wen Hsiao Peng, Jorn Ostermann

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

Original languageEnglish
Title of host publication2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350359855
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 - Jeju, Korea, Republic of
Duration: 4 Dec 20237 Dec 2023

Publication series

Name2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

Conference

Conference2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
Country/TerritoryKorea, Republic of
CityJeju
Period4/12/237/12/23

Keywords

  • conditional coding
  • scalable coding
  • spatial scalability
  • video coding
  • VVC

Fingerprint

Dive into the research topics of 'Learning-Based Scalable Video Coding with Spatial and Temporal Prediction'. Together they form a unique fingerprint.

Cite this