With the increasing demand for video-signal processing and transmission, high-speed programmable FIR filters are required for real-time processing. This paper presents a hardware-efficient pipelined FIR architecture with programmable coefficients. FIR operations are first reformulated into multi-bit DA form at an algorithm level. Then, at the architecture level, the (p, q) compressor, instead of Booth encoding or RAM implementation, is used for high-speed operation. Due to the simple architecture, we can easily pipeline the proposed FIR filter to the adder level and save up to half of the cost of previous designs without sacrificing performance. The presented design is useful for bit-parallel input design, which can save 36.7% of the area cost compared with previous approaches.