In this paper, we perform the design space exploration for an H.264/AVC video embedding transcoder. Specifically, the design space is pruned for the sub-modules including inverse transform, inter and intra prediction, and deblocking filter with various microarchitecture designs, processing order, memory hierarchy, and granularity of synchronization. In addition, we propose an efficient deblocking filter suitable for 8×8 block pipeline. Compared to the traditional designs, our proposed deblocking filter reduces memory requirement, processing latency, and access frequency to the local memory. The synthesized logic gate count is only 8K using the 0.18 um technology with the maximum frequency of 162 MHz. For rapid exploration, all the design alternatives are simulated with higher level of abstraction using the transaction level modeling to explore 160 design combinations. Our simulation results provide an extensive tradeoff analysis among processing speed, cost, and utilization. Besides, the cost-normalized hardware utilization where the cost of each sub-module weights its associated utilization assists the system designers to keep a balance across different modules.