TY - GEN
T1 - Semantic segmentation on compressed video using block motion compensation and guided inpainting
AU - Tanujaya, Stefanie
AU - Chu, Tieh
AU - Liu, Jia Hao
AU - Peng, Wen Hsiao
N1 - Publisher Copyright:
© 2020 IEEE
PY - 2020/10/12
Y1 - 2020/10/12
N2 - This paper addresses the problem of fast semantic segmentation on compressed video. Unlike most prior works for video segmentation, which perform feature propagation based on optical flow estimates or sophisticated warping techniques, ours takes advantage of block motion vectors in the compressed bitstream to propagate the segmentation of a keyframe to subsequent non-keyframes. This approach, however, needs to respect the inter-frame prediction structure, which often suggests recursive, multi-step prediction with error propagation and accumulation in the temporal dimension. To tackle the issue, we refine the motion-compensated segmentation using inpainting. Our inpainting network incorporates guided non-local attention for long-range reference and pixel-adaptive convolution for ensuring the local coherence of the segmentation. A fusion step then follows to combine both the motion-compensated and inpainted segmentations. Experimental results show that our method outperforms the state-of-the-art baselines in terms of segmentation accuracy. Moreover, it introduces the least amount of network parameters and multiply-add operations for non-keyframe segmentation.
AB - This paper addresses the problem of fast semantic segmentation on compressed video. Unlike most prior works for video segmentation, which perform feature propagation based on optical flow estimates or sophisticated warping techniques, ours takes advantage of block motion vectors in the compressed bitstream to propagate the segmentation of a keyframe to subsequent non-keyframes. This approach, however, needs to respect the inter-frame prediction structure, which often suggests recursive, multi-step prediction with error propagation and accumulation in the temporal dimension. To tackle the issue, we refine the motion-compensated segmentation using inpainting. Our inpainting network incorporates guided non-local attention for long-range reference and pixel-adaptive convolution for ensuring the local coherence of the segmentation. A fusion step then follows to combine both the motion-compensated and inpainted segmentations. Experimental results show that our method outperforms the state-of-the-art baselines in terms of segmentation accuracy. Moreover, it introduces the least amount of network parameters and multiply-add operations for non-keyframe segmentation.
UR - http://www.scopus.com/inward/record.url?scp=85109347635&partnerID=8YFLogxK
U2 - 10.1109/ISCAS45731.2020.9181054
DO - 10.1109/ISCAS45731.2020.9181054
M3 - Conference contribution
AN - SCOPUS:85109347635
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - 2020 IEEE International Symposium on Circuits and Systems, ISCAS 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 52nd IEEE International Symposium on Circuits and Systems, ISCAS 2020
Y2 - 10 October 2020 through 21 October 2020
ER -