This manuscript presents a VLSI architecture and its design rule, called embedded instruction code (EIC), to realize discrete wavelet transform (DWT) codec in a single chip. Since the essential computation of DWT is convolution, we build a set of multiplication instruction, MUL, and the addition instruction, ADD, to complete the work. We segment the computation paths of DWT according to the multiplication and addition, and apply the instruction codes to execute the operators. Besides, we offer a parallel arithmetic logic unit (PALU) organization that is composed of two multipliers and four adders (2M4A) in our design. Thus, the instruction codes programmed by EIC control the PALU to compute efficiently. Additionally, we establish a few necessary registers in PALU, and the number of registers depends on the wavelet filters' length and the decomposition level. Yet, the numbers of multipliers and adders do not increase as we execute the DWT or the inverse DWT (IDWT) in multilevel decomposition. Furthermore, we deduce the similarity between DWT and IDWT, so the functions can be integrated in the same architecture. Besides, we schedule the instructions; thus, the execution of the multilevel processes can be achieved without superfluous PALU in a single chip. Moreover, we solve the boundary problem of DWT by using the symmetric extension. Therefore, the perfect reconstruction (PR) condition for DWT requirement can be accomplished. Through EIC, we can systematically generate a flexible instruction codes while we adopt different filters. Our chip supports up to six levels of decomposition, and versatile image specifications, e.g., VGA, MPEG-1, MPEG-2, and 1024 × 1024 image sizes. The processing speed is 7.78 Mpixel/s when the operation frequency, for normal case, is 100 MHz.