The paper presents a hardware-efficient fast algorithm and its architecture for large search range motion estimation (ME) used in HDTV sized H.264 video coding. To solve the high cost and latency in large search range case, the proposed algorithm processes ME in parallel multi-resolution levels instead of serial process in the previous approach. This enables high data reuse for lower bandwidth and low memory cost. Further combining with our previous proposed mode filtering and bit truncation, the algorithm only increases the bit rate within -0.58% and 3.06% and at most 0.04dB and 0.07dB PSNR degradation for 720p and 1080p sequences respectively. The hardware implementation can save up to 49.5% of area cost and 65% of memory cost compared to the previous approach for large search range to [-128, 127].