1. Field of the Invention
The presented invention relates to a method and system for discrete cosine transforms/inverse discrete cosine transforms, which especially relates to a method and system for discrete cosine transforms/inverse discrete cosine transforms based on pipeline architecture.
2. Description of the Prior Art
Because the discrete cosine transform (DCT) is suitable for de-correlating real-valued signals and to concentrate signal components in low frequency, DCT has been widely used in image compression system and software. For example, DCT and IDCT (inverse discrete cosine transform) have been applied on H.261 standard for video conference, and applied on JPEG standard for still image, MPEG standard for moving pictures that established by ISO (International Standard Organization).
In foregoing application, DCT is used in data compression, and IDCT is used in data decompression. One of the most famous DCT/IDCT technology is the Fast Fourier Transform (FFT) based on Lee's algorithm. Referring to FIG. 1A, FIG. 1A shows the Lee's algorithm implementing in shuttle exchange circuit architecture, therein DCT procedure is divided into 4 computing phases as first computing phase, second computing phase, third computing phase, and fourth computing phase, in which eight parallel numerical data, X[0], X[1], . . . , X[7] are inputted, and then perform discrete cosine transform to output eight parallel numerical data, Y[0], Y[1], . . . , Y[7]. The DCT processor illustrating in FIG. 1A could be divided into two blocks: DCT processor 1 and DCT post-processor 2. DCT processor 1 is a composition of twelve similar process elements 3 and designed in butterfly circuits architecture, which followed a DCT post-processor 2 composed of five addition units 4 and one fixed-coefficient multiplication unit 5. Each process element 3 includes an addition unit 31, a subtract unit 32, and a fixed-coefficient multiplication unit 5. With respect to the fixed-coefficient multiplication unit in process element 3, in which there are four symbolized as A, two symbolized as B, two symbolized as C, and four symbolized as D, E, F, and G for each. The numerical value for those fixed-coefficient multiplication units which symbolized as A, B, C, D, E, F, and G are
            1      2        ⁢          cos      ⁡              (                  π          /          4                )              ,            1      2        ⁢          cos      ⁡              (                  π          /          8                )              ,            1      2        ⁢          cos      ⁡              (                  3          ⁢                      π            /            8                          )              ,            1      2        ⁢          cos      ⁡              (                  π          /          16                )              ,            1      2        ⁢          cos      ⁡              (                  3          ⁢                      π            /            16                          )              ,            1      2        ⁢          cos      ⁡              (                  7          ⁢                      π            /            16                          )              ,      and    ⁢                  ⁢          1      2        ⁢          cos      ⁡              (                  5          ⁢                      π            /            16                          )            for each other. If regardless of individual addition unit, subtract unit, and multiplication unit, DCT procedure illustrating in FIG. 1A needs no control devices to handle whole computing procedure. The DCT data-flow dependence design unnecessary of control devices may design in data-flow architecture directly.
In proportion to FIG. 1A, FIG. 1B is a simplified chart of IDCT circuits implementing via Lee's algorithm. As the same as DCT, IDCT procedure is also divided into four computing phases as first computing phase, second computing phase, third computing phase, and fourth computing phase, in which eight parallel numerical data, Y[0], Y[1], . . . , X[7] are inputted, and then perform inverse discrete cosine transform to output eight parallel numerical data, Y[0], Y[1], . . . , Y[7]. Also the IDCT procedure in FIG. 1B is divided into two blocks: IDCT processor 7 and IDCT prior-processor 6. IDCT processor 7 is a compositions of twelve similar process elements 8 and designed in butterfly circuits architecture, which further connects to the IDCT prior-processor 6 composed of five addition units 9 and one fixed-coefficient multiplication unit 10. Each process element 8 includes an addition unit 81, a subtract unit 82, and a fixed-coefficient multiplication unit 10. With respect to the fixed-coefficient multiplication unit in process element 8, in which there are four symbolized as A, two symbolized as B, two symbolized as C, and four symbolized as D, E, F, and G for each. The numerical value for those fixed-coefficient multiplication units symbolized as A, B, C, D, E, F, and G are the same to those appeared in FIG. 1A.
Because the DCT/IDCT processor implemented in above-mentioned Lee's algorithm architecture may be constructed with similar processing elements, it has advantage of modular design and re-usage. Nowadays DCT/IDCT could apply on many aspects of application, such as JPEG, MPEG, or HDTV, for which different applications need different computation efficiency, and if the needed process elements could be adjusted for different demands of performance, more benefits may be achieved in the aspects of space and cost.