In several communications systems the data to be transmitted is compressed so that the available bandwidth is used more efficiently. For example, the Moving Pictures Experts Group (MPEG) has promulgated several standards relating to digital data delivery systems. The first, known as MPEG-1 refers to ISO/IEC standards 11172 and is incorporated herein by reference. The second, known as MPEG-2, refers to ISO/IEC standards 13818 and is incorporated herein by reference. A compressed digital video system is described in the Advanced Television Systems Committee (ATSC) digital television standard document A/53, and is incorporated herein by reference.
The above-referenced standards describe data processing and manipulation techniques that are well suited to the compression and delivery of video, audio and other information using fixed or variable length digital communications systems. In particular, the above-referenced standards, and other xe2x80x9cMPEG-likexe2x80x9d standards and techniques, compress, illustratively, video information using intra-frame coding techniques (such as run-length coding, Huffman coding and the like) and inter-frame coding techniques (such as forward and backward predictive coding, motion compensation and the like). Specifically, in the case of video processing systems, MPEG and MPEG-like video processing systems are characterized by prediction-based compression encoding of video frames with or without intra- and/or inter-frame motion compensation encoding.
To achieve significant image compression, several of the above standards employ the discrete cosine transform (DCT) to convert pixel domain information into frequency domain information at an encoder. The frequency domain information is then compressed, and the compressed, or encoded, digital video information is transmitted to one or more decoders. The decoder(s) employ various decompression schemes including the inverse discrete cosine transform (IDCT) to retrieve the compressed, or encoded, digital video information. Thus, the DCT is applied in the compression of images, and an Inverse Discrete Cosine Transform (IDCT) is applied to the compressed images to recover the original images.
Many software-based algorithms for computing the IDCT have been devised. In digital video playback applications such as HDTV and DVD, however, it is essential that the decoding of the compressed video be performed very rapidly. In such applications hardware decoders are required, and therefore a hardware implementation of IDCT is needed as a component of these decoders. Two (conflicting) design objectives of a hardware IDCT implementation are to maximize throughput (i.e., the number of IDCT coefficients computed per clock cycle) while minimizing the total number of gates required for the computations. A hardware implementation that provides both high throughput and a low gate count is said to be efficient.
Although many good algorithms have been formulated for computing the IDCT in software, such as the Fast IDCT algorithm (xe2x80x9cFast Algorithms for Discrete W Transform and for the Discrete Fourier Transform,xe2x80x9d Zhongde Wang, IEEE Trans. On Acoustics, Speech and Signal Processing, Vol. ASSP-32, No. 4, pp. step 220-8120, August, 1984), such is not the case for IDCT hardware implementations. Unfortunately, a straightforward mapping of even a good software IDCT algorithm to hardware does not yield an efficient hardware implementation. The problem of intelligently mapping IDCT software algorithms to hardware has received little attention, and the few such mappings that have been proposed still do not result in particularly efficient hardware implementations. There is therefore a need in the art for an efficient hardware implementation for performing an IDCT; that is, an implementation that combines high throughput with low gate count.
The present invention is a method and apparatus for performing an Inverse Discrete Cosine Transform (IDCT). The method is based on an existing software IDCT algorithm called the Fast IDCT algorithm, which performs a series of 11 multiplications and 29 additions sequentially (i.e., 40 processing cycles) to produce a one-dimensional, eight coefficient IDCT. The method of the present invention, by contrast, operates in a computationally efficient manner to provide increased IDCT throughput with fewer processing steps. Specifically, the method and apparatus of the present invention produce a one-dimensional IDCT using eight processing cycles to perform the 11 multiplications and 29 additions.
Specifically, an apparatus for performing a one dimensional N-coefficient inverse discrete cosine transform (IDCT) an a set of DCT coefficients {X0, X1, . . . XN} to produce a set of IDCT coefficients {x0, x1, . . . xN}, where N is an integer, comprising: N adders, where each of the N adders produces a sum in response to two respective addends; M multipliers, where each of the M multipliers produces a product in response to two respective multiplicands, where M is an integer value less than N/2; a memory; and routing logic, coupled to the memory, the adders and the multipliers, for receiving the N DCT coefficients and for routing data between the memory and the adders and multipliers; the routing logic routing the data according to N processing cycles; the routed data including representations of the received DCT coefficients, intermediate operands produced by one or more of the adders and multipliers, and the IDCT coefficients; a first IDCT coefficient and an Nth IDCT coefficient being produced during an (Nxe2x88x921)th processing cycle; and a remaining plurality of IDCT coefficients being produced during an Nth processing cycle.