TY - GEN
T1 - Exploiting parallelism in the H.264 deblocking filter by operation reordering
AU - Weng, Tsung Hsi
AU - Wang, Yi Ting
AU - Chung, Chung-Ping
PY - 2011
Y1 - 2011
N2 - In the H.264 video compression standard, the deblocking filter contributes about one-third of all computation in the decoder. With multiprocessor architectures becoming the future trend of system design, computation time reduction can be achieved if the deblocking filter well apportions its operations to multiple processing elements. In this paper, we apply a 16 pixel long boundary, the basic unit for deblocking in the H.264 standard, as the basis for analyzing and exploiting possible parallelism in deblocking filtering. Compared with existing approaches using a macroblock as a basic unit for analysis, a 16 pixel long boundary by having a finer granularity can improve the chances of increasing the degree of parallelism. Moreover, a possible compromise to fully utilize limited hardware resources and hardware architectural requirements for deblocking are also proposed in this paper. Compared with the 2D wave-front method order for deblocking both 1920*1080 and 1080*1920 pixel sized frames, the proposed design gains speedups of 1.57 and 2.15 times given an un-limited number of processing elements respectively. Using this approach, the execution time of the deblocking filter is proportional to the square root of the growth of the frame size (keeping the same width/height ratio), pushing the boundary of practical real-time deblocking of increasingly larger video sizes.
AB - In the H.264 video compression standard, the deblocking filter contributes about one-third of all computation in the decoder. With multiprocessor architectures becoming the future trend of system design, computation time reduction can be achieved if the deblocking filter well apportions its operations to multiple processing elements. In this paper, we apply a 16 pixel long boundary, the basic unit for deblocking in the H.264 standard, as the basis for analyzing and exploiting possible parallelism in deblocking filtering. Compared with existing approaches using a macroblock as a basic unit for analysis, a 16 pixel long boundary by having a finer granularity can improve the chances of increasing the degree of parallelism. Moreover, a possible compromise to fully utilize limited hardware resources and hardware architectural requirements for deblocking are also proposed in this paper. Compared with the 2D wave-front method order for deblocking both 1920*1080 and 1080*1920 pixel sized frames, the proposed design gains speedups of 1.57 and 2.15 times given an un-limited number of processing elements respectively. Using this approach, the execution time of the deblocking filter is proportional to the square root of the growth of the frame size (keeping the same width/height ratio), pushing the boundary of practical real-time deblocking of increasingly larger video sizes.
KW - deblocking
KW - multi-core
KW - parallelization
UR - http://www.scopus.com/inward/record.url?scp=80455144602&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-24650-0_8
DO - 10.1007/978-3-642-24650-0_8
M3 - Conference contribution
AN - SCOPUS:80455144602
SN - 9783642246494
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 80
EP - 92
BT - Algorithms and Architectures for Parallel Processing - 11th International Conference, ICA3PP 2011, Proceedings
T2 - 11th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2011
Y2 - 24 October 2011 through 26 October 2011
ER -