In this work, a high performance LDPC decoder architecture is presented. It is a partially-parallel architecture for low-complexity consideration. In order to eliminate the idling time and hardware complexity in conventional partially-parallel decoders, the decoding process, decoder architecture and memory structure are optimized. Particularly, the parity-check matrix is optimally partitioned into four unequal submatrices that lead to high efficiency in hardware sharing. As a result, it can handle two different codewords simultaneously with 100% hardware utilization. Furthermore, for minimizing the performance loss due to round-off errors in fixed-point implementations, the well-known modified min-sum decoding algorithm is enhanced by our recently proposed high-performance CMVP decoding algorithm. Overall, the proposed decoder has high throughput, low complexity, and good BER performances. In the circuit implementation example of the (576,288) parity check matrix for IEEE 802.16e standard, the decoder achieves a data rate of 5.5Gbps assuming 10 decoding iterations and 7 quantization bits, with a small area of 653K gates, based on UMC 90 nm process technology.