1. Field of the Invention
The invention relates to a method and apparatus for performing discrete cosine transform and its inverse, more particularly to a DCT/IDCT apparatus which is capable of real-time processing and which has a relatively simple and small hardware construction.
2. Description of the Related Art
Forward and inverse discrete cosine transforms (DCT/IDCT) are performed during the compression and decompression of digital image data. In a conventional digital image compression operation, an original image signal is usually divided into a number of 8.times.8 pixel blocks, each of which undergo a DCT operation so as to generate DCT transform data. In a conventional digital image decompression operation, IDCT is performed with the DCT transform data which result from the DCT of one pixel block in order to retrieve the original image signal.
If a two-dimensional DCT/IDCT operation is to be executed, each row (or column) of a data block undergoes a first one-dimensional DCT/IDCT. Each column (or row) of the resulting DCT/IDCT transform data then undergoes a second one-dimensional DCT/IDCT, thus completing the two-dimensional DCT/IDCT operation. The one-dimensional DCT of an 8.times.8 pixel block can be obtained from the following equation: ##EQU1## wherein:
C(k) is equal 2.sup.-1/2 when k=0 and is equal to 1 when k=1, 2, . . . 7;
S(m) is the pixel data in spatial domain; and
F(k) is the resulting DCT transform data.
A DCT fast algorithm which can be derived from the above equation involves thirteen multiplication operations and twenty-nine addition/subtraction operations. FIG. 1 is a flow graph illustrating the DCT fast algorithm. The DCT fast algorithm uses three kinds of arithmetic operations: butterfly, intrinsic multiplication, and post-addition multiplication, as shown in FIGS. 2A to 2C. Referring to FIG. 2D, a fourth kind of arithmetic operation, the post-multiplication subtraction, is used in a corresponding IDCT fast algorithm.
Referring once more to FIG. 1, the DCT fast algorithm uses twelve butterfly operations, five post-addition multiplication operations and eight intrinsic multiplication operations. A conventional apparatus that is capable of performing the DCT flow graph of FIG. 1 can be divided into six operating units: a first unit capable of performing four butterfly operations; a second unit capable of performing two post-addition multiplication operations; a third unit capable of performing four more butterfly operations; a fourth unit capable of performing three post-addition multiplication operations; a fifth unit capable of performing another four butterfly operations; and a sixth unit capable of performing eight intrinsic multiplication operations.
The IDCT fast algorithm can be obtained by performing the DCT fast algorithm in a reverse sequence. FIG. 3 illustrates the flow graph of the IDCT fast algorithm. Note that a conventional apparatus which is capable of performing the IDCT flow graph can also be divided into six operating units: a first unit capable of performing eight intrinsic multiplication operations; a second unit capable of performing four butterfly operations; a third unit capable of performing three post-multiplication subtraction operations; a fourth unit capable of performing four more butterfly operations; a fifth unit capable of performing two more post-multiplication subtraction operations; and a sixth unit capable of performing another four butterfly operations.
If it is desired to process an 8.times.8 data block with the use of two-dimensional DCT/IDCT, a first apparatus that is capable of performing the above described DCT/IDCT fast algorithms is provided so as to execute a first one-dimensional DCT/IDCT operation. The transform data resulting from the first apparatus are then provided to a second apparatus which is similar to the first apparatus in order to perform a second one-dimensional DCT/IDCT operation.
Therefore, the conventional DCT/IDCT apparatus are relatively expensive since they involve the use of large and relatively complicated hardwired logic circuits which are designed in order to achieve precise pipeline processing at a very high processing speed. However, in actual practice, most applications do not require data processing at a very high processing speed in order to achieve real time transformation.