1. Technical Field
The present invention relates to digital signal processing. More particularly, the present invention relates to digital signal decompression.
2. Description of the Prior Art
With the emergence of image and video compression standards, such as those promulgated by the Joint Photographics Experts Group ("JPEG"), the Moving Pictures Experts Group ("MPEG"), and the Px64 standard, there has been considerable research toward developing fast algorithms to perform the data coding functions outlined in the standards.
The JPEG, MPEG1, MPEG2, and Px64 standards employ essentially the same decompression framework. The main decompression pipeline for these standards is shown in FIG. 1. During decompression, a compressed bit stream 10 is provided to a Huffman decoder 12. The Huffman decoded signal is inverse quantized 14 and then a two-dimensional inverse discrete cosine transform ("IDCT") operation 16 is performed on the signal to complete the decompression process.
Image and video compression standards, such as JPEG, MPEG, and Px64, rely on a two-dimensional 8.times.8 IDCT as the key processing function during data decompression. The IDCT is inherently a compute-intensive task, i.e. direct calculation of an 8.times.8 IDCT requires 4096 multiply-accumulate operations.
In the prior art, an 8.times.8 IDCT is performed as eight 8-point row IDCTs, followed by eight 8-point column IDCTs. This approach is commonly referred to as the row-column approach. A single 8-point IDCT is specified by the following equation: ##EQU1## In matrix form, this equation can be written as s=A S, where A is referred to as the IDCT basis and is: ##EQU2##
If the row-column approach is used to calculate s, then an 8-point IDCT calculation requires sixty-four multiply operations and sixty-four addition operations. This amounts to 1024 multiply operations and 1024 addition operations for an 8.times.8 IDCT calculation. Such operations still require considerable time, compute power, and memory.
It is possible to factor A[i,j] as a product of several sparse matrices. This is the basic approach behind many known fast algorithms for IDCT calculations. Different approaches towards this factorization are discussed in W. H. Chen, C. H. Smith, S. C. Fralick, A Fast Computational Algorithm for the Discrete Cosine Transform, IEEE Trans. Communications, Vol. COM-25, pp. 1004-1009, September 1977; and B. G. Lee, A New Algorithm to Compute the Discrete Cosine Transform, IEEE Trans. on Acoust., Speech and Signal Processing, Vol. ASSP-32, No. 6, pp. 1243-45, December 1984. Both of these known schemes reduce the operation counts to 192-256 multiply operations and 416-464 addition operations for an 8.times.8 IDCT.
In a decompression context, the IDCT is preceded by an inverse quantization step which essentially takes the Huffman decoder output matrix entries h[i,j] and multiplies h[i,j] by q[i,j] to generate the IDCT input matrix. Since the inverse quantization step has to be performed, it is possible to write the IDCT matrix A as the product of two matrices: EQU A=DF, (4)
where D is a diagonal matrix and F is another 8.times.8 matrix. Since, D is a diagonal matrix, q[i,j] can first be scaled by the entries in D, and the IDCT input matrix can then be generated.
Thus, the development of a fast algorithm for the IDCT operation in the various decoding standards requires development of a sparse factorization on F[i,j] and not on A[i,j], as was the case in the Chen or Lee DCT algorithms. This approach is referred to as a scaled IDCT, and was recently described in E. Feig, S. Winograd, Fast Algorithms for the Discrete Cosine Transform, preprint of paper submitted to IEEE Trans. on Acoust., Speech and Signal Processing.
The scaled IDCT exploits the scaling feature of the algorithm to reduce the number of IDCT operations to 54 multiply operations, 462 addition operations, and 6 shift right by one operations. Unfortunately, Feig and Winograd's implementation requires access to two-dimensional data within some of its computation stages; i.e. it is not a true row-column approach. Thus, all of the 64 entries in the 8.times.8 IDCT input have to be available in the registers (local storage) of the CPU. Whereas, in the row-column approach, only eight entries in the 8.times.8 IDCT input need to be available in the local storage of the CPU at any given time. The row-column approach would therefore be preferred because it makes efficient use of the finite local storage of the CPU.
Continual progress should be made in implementing the various coding standards to improve real time encoding and decoding of digital information, while simplifying hardware designs, processor speed requirements and complexity, and memory requirements, if the full potential of the emerging multi-media technologies is to be realized.