TY - GEN
T1 - Platform based deisign of all binary motion estimation (ABME) with bus interleaved architecture
AU - Wang, Shih Hao
AU - Tao, Wei Lun
AU - Wang, Chung Neng
AU - Peng, Wen-Hsiao
AU - Chiang, Tihao
PY - 2005/12/1
Y1 - 2005/12/1
N2 - This paper presents an efficient hardware-software implementation with a marcoblock based pipelining and a bus interlaced architecture for all binary motion estimation (ABME), which has been proven to be simple and low cost for hardware design. The bus interleaved preprocessing module of the ABME architecture can generate downsampling and binarized data in the same flow without additional dedicated hardware. With the 3layer binary bitplane of ABME, we use a two-dimensional (2-D) mapping unit and a binary adder tree instead of a systolic array to compute the block matching metric, which is sum of difference (SoD), in one cycle. In addition, a new bus bandwidth reduction scheme is proposed by reusing the binarized image, which can achieve up to 67% bus bandwidth saving. The experiment shows that for each macroblock, our design can finish ABME within 283 cycles with 65k gate counts synthesized by UMC 0.18um cell library.
AB - This paper presents an efficient hardware-software implementation with a marcoblock based pipelining and a bus interlaced architecture for all binary motion estimation (ABME), which has been proven to be simple and low cost for hardware design. The bus interleaved preprocessing module of the ABME architecture can generate downsampling and binarized data in the same flow without additional dedicated hardware. With the 3layer binary bitplane of ABME, we use a two-dimensional (2-D) mapping unit and a binary adder tree instead of a systolic array to compute the block matching metric, which is sum of difference (SoD), in one cycle. In addition, a new bus bandwidth reduction scheme is proposed by reusing the binarized image, which can achieve up to 67% bus bandwidth saving. The experiment shows that for each macroblock, our design can finish ABME within 283 cycles with 65k gate counts synthesized by UMC 0.18um cell library.
UR - http://www.scopus.com/inward/record.url?scp=33745436437&partnerID=8YFLogxK
U2 - 10.1109/VDAT.2005.1500065
DO - 10.1109/VDAT.2005.1500065
M3 - Conference contribution
AN - SCOPUS:33745436437
SN - 0780390601
SN - 9780780390607
T3 - 2005 IEEE VLSI-TSA International Symposium on VLSI Design, Automation and Test,(VLSI-TSA-DAT)
SP - 241
EP - 244
BT - 2005 IEEE VLSI-TSA International Symposium on VLSI Design, Automation and Test,(VLSI-TSA-DAT)
T2 - 2005 IEEE VLSI-TSA International Symposium on VLSI Design, Automation and Test,(VLSI-TSA-DAT)
Y2 - 27 April 2005 through 29 April 2005
ER -