1. Field of the Invention
The discrete cosine transform (DCT), a special kind of orthonormal transform, has been widely accepted as the preferred method for compressing and decompressing gray-scaled images. A DCT compressor comprises mainly two parts: The first part transforms highly correlated image data into weakly correlated coefficients using a DCT transform and the second part performs adaptive quantization on coefficients to reduce the bit rate for transmission or storage. However, the computational burden in performing a DCT is demanding. For example, to process a one-dimensional DCT of length 8 pixels requires 11 multiplications and 29 additions in currently known fast algorithms. In practice, the image is divided into square blocks of size 8 by 8 pixels, 16 by 16 pixels or 32 by 32 pixels. Each block is often processed by the one-dimensional DCT in row-by-row fashion followed by column-by-column. On the other hand, different image block sizes are selected for compression due to different types of input images and different quality requirements on the decompressed image. A radix-2 DCT algorithm has been described in the article, "A Fast Recursive Algorithm for Computing the Discrete Cosine Transform," IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, No. 10, pp. 1455-1461, by H. S. Hou in October 1987. The purpose is to reduce the number of multiplications as well as to offer design flexibility of processing different sizes of DCT blocks. The references in the above article list the prior arts of the DCT algorithms.
In the article, "A Fast DCT-SQ Scheme for Images," Trans. IEICE, Vol. E-71, No. 11, pp. 1095-1097, Nov. 1988, Y. Arai, T. Agui, and M. Nakajima have proposed that many of the DCT multiplications can be formulated as scaling multipliers to the DCT coefficients. The DCT after the multipliers are factored out is called the scaled DCT. Evidently, the scaled DCT is still orthogonal but no longer normalized, whereas the scaling factors may be restored in the following quantization process. Arai, et al. have demonstrated in their article that only 5 multiplications and 29 additions are required in processing an 8-point scaled DCT. Then E. Feig has mathematically described the scaled DCT, in particular the 8 by 8 scaled DCT, in U.S. Pat. No. 5,293,434 issued on Mar. 8, 1994 and the article, "A Fast Scaled-DCT Algorithm," presented at the 1990 SPIE/SPSE Symposium of Electronic Imaging Science and Technology, Feb. 12, 1990, Santa Clara, Calif. However, the recursive properties of the scaled DCT have not been mentioned in the previous publications. Subsequently, H. S. Hou described the recursive properties of the scaled DCT in radix-2 formulations in the article, "Recursive Scaled-DCT," presented at the 1991 SPIE International Symposium, conference 1567, Jul. 22, 1991, San Diego, Calif.
The goal of previous DCT algorithms with the scaled DCT included is to reduce the number of multiplications in the processor. But the fastest processors today are based on the fused multiply and add operations in pipeline architectures. In the fused multiply and add operations, a multiplication and an addition in the form of a+bc can be performed in one instruction cycle. For example, according to the current specification of the microprocessor i860 from Intel, a 32-bit fused multiply and add operation takes 20 nsec, whereas a single 32-bit multiply or a single 32-bit add also takes 20 nsec. Hence, there is a net gain in processing speed if we can take advantage of these architectures for implementation of the scaled DCT. E. Feig and E. Linzer have described the result of using the fused multiply and add architecture in performing an 8-point scaled DCT in their article, "Scaled DCT Algorithms for JPEG and MPEG Implementations on Fused Multiply/Add Architectures," presented in SPIE conferences, 1991. Again the recursive nature of the scaled DCT has not been considered for the selection of different sizes of image blocks under program control.
All the recursive DCT algorithms published today are in radix-2 forms, i.e., splitting an Nth order DCT into two (N/2)th order DCT. Yet, it is known in fast Fourier transforms and fast Hartley transforms that split-radix algorithms give the fastest operations. But no corresponding split-radix DCT has been known to exist in the state-of-the-art. This invention discloses the split-radix algorithm and the implementation schemes for processing the regular DCT and the scaled DCT.
Due to the fact that the number of arithmetic operations in performing a DCT grows faster than linearly proportional to the number of input, from both the speed performance and the design flexibility viewpoints, it is desirable to use the recursive algorithms for both the DCT and the scaled DCT. In so doing, one can process a combination of lower order DCT instead of a higher order DCT by itself. In the radix-2 recursive DCT algorithm, an Nth order DCT contains two (N/2)th order DCT; whereas in the radix-2 recursive scaled DCT algorithm, an Nth order scaled DCT contains one (N/2)th order scaled DCT and one (N/2)th order scaled IDCT. The disclosed split-radix DCT and the split-radix scaled DCT algorithms are further improvements of the radix-2 algorithms, because in the split-radix DCT algorithm an Nth order DCT contains an (N/2)th order DCT and two (N/4)th order IDCT, whereas in the split-radix scaled DCT algorithms an Nth order scaled DCT contains an (N/2)th order scaled DCT and two (N/4)th order modified scaled IDCT.