1. Field of the Invention
The present invention relates to image/document processing. More specifically, the present invention relates to performing the inverse discrete cosine transform.
2. Description of Related Art
Motion video and still image compression algorithms such as those based on JPEG (Joint Photographic Experts Group), DV (Digital Video) and MPEG (Motion Picture Experts Group) perform as part of the coding of images what is commonly known as a FDCT (Forward Discrete Cosine Transform). JPEG, for instance, compresses an image frame by considering independent sub-frames (for instance, Red, Green and Blue channels, each in its own sub-frame) of component channels in the image. Most such image compression schemes use “block-based” coding wherein the image sub-frame or frame is subdivided into blocks of 8 by 8 (or, sometimes, 16 by 16) pixels. A FDCT is performed on each block generating a new 8 by 8 block of values representing the energy of block at various spatial frequencies. The resulting 8 by 8 block is then “quantized” by mapping a given range of values which are possible within the transformed image to a smaller range of values. For instance, simple linear quantization would divide each FDCT pixel value by a scaling factor. The result of this quantization process is a set of values that are likely to contain a large number of zeroes. After quantization, the data is then encoded (for instance, using entropy encoding techniques) and stored in its final compressed form.
When decompressing the compressed image, the process described above is reversed. Thus, after unpacking the data and decoding it into its quantized form, the set of quantized spatial frequencies are inverse quantized by multiplying them by the same scaling factor(s) used during quantization. The resulting recovered values which closely approximate the original values are then subjected to an Inverse Discrete Cosine Transform (IDCT) to convert from spatial frequencies back to pixel component values. The IDCT is usually performed on the same size (8 by 8) blocks of the recovered values. The recovered values ordinarily contain a large number of zeroes and when done in blocks, the IDCT must be performed in a two-dimensional (row and column considered together) manner. The two-dimensional IDCT on an 8 by 8 block of values takes the form:
            f      ⁡              (                  x          ,          y                )              =                  ∑                  u          =          0                7            ⁢                        ∑                      v            =            0                    7                ⁢                              F            ⁡                          (                              u                ,                v                            )                                ⁢                      C            u                    ⁢                      C            v                    ⁢                      cos            ⁡                          [                                                                    (                                                                  2                        ⁢                        x                                            +                      1                                        )                                    ⁢                  u                  ⁢                                                                          ⁢                  π                                16                            ]                                ⁢                      cos            ⁡                          [                                                                    (                                                                  2                        ⁢                        y                                            +                      1                                        )                                    ⁢                  v                  ⁢                                                                          ⁢                  π                                16                            ]                                            ,
where f(x,y) is a resultant pixel value at position (x,y), F(u,v) is a recovered value at position (u,v) in the 8 by 8 block, and Cu and Cv are constants with different values when u or v are zero and non-zero. As evident from the formulation, the IDCT involves a large number of multiply and add operations.
The typical manner of performing a two-dimensional IDCT is to first perform a scalar (one-dimensional) IDCT on rows and then performing a second one-dimensional IDCT on columns of the block resulting from one-dimensional row IDCT. Even with a one-dimensional IDCT, assuming that all the cosine terms and constants are pre-computed together, calculating each resultant value potentially involves at least eight multiplies and seven adds. Most improvements to implementing the IDCT within a given computing platform are directed toward the speed of the platform in performing adds in comparison with multiplies since platforms vary with regard to this. Further, many improvements to the IDCT depending upon the precision (integer, floating point etc.) to be used for the terms, intermediate results and the final values.
While most of the most improvements to implementing the IDCT concentrate on platform-specific efficiencies and inefficiencies, other proposed improvements to implementing the IDCT take advantage of the nature of the inverse quantized values. The values recovered from inverse quantization (upon which the IDCT is to be performed) exhibit a large number of zero values. Some improvements focus therefore upon performing multiplies and adds only when necessary. Such implementations often involve compare instructions (determining if a value is non-zero or zero) and branch instructions that redirect program flow (when certain multiplys or adds can be avoided for instance). Each and every of the eight values that are to be fed into the one-dimensional IDCT can be tested for zero and then an appropriate case can be selected depending upon the test. The branching performed is multiple in nature. This scheme is unsatisfactory because there are too many cases that may occur and since branching itself could become an expensive operation to perform in most modern computing architectures. For instance, MMX and other architectures that use SIMD (Single Instruction, Multiple Data) instruction sets do not perform multiple branching efficiently at all.
There is thus a need for an enhanced IDCT method and apparatus that takes advantage of the number of zeroes in the input blocks.