The present invention relates to MPEG video data decoders and specifically to a method and apparatus for improving the performance of MPEG video data decoders.
MPEG, which stands for Moving Pictures Experts Group, is a standard for compression of video and audio for broadcast video/audio and compact discs. MPEG (video and audio systems) is the exclusive syntax of the United States Grand Alliance HDTV specification, the European Digital Video Broadcasting Group, and the high density compact discs. MPEG-1 and MPEG-2 are well known and documented and referred to respectively as ISO/IEC 11172 Information Technology Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s, 1993 and ISO/IEC 13818 Information Technologyxe2x80x94Generic Coding of Moving Pictures and Associated Audio, the disclosures of which are hereby incorporated by reference.
In accordance with the standard, MPEG video provides an efficient way to represent image sequences in the form of more compactly coded data. MPEG also describes a decoding (reconstruction) process whereby coded bits in a transmitted MPEG video bit stream are mapped from the compressed representation into the original raw video signal data format of the image sequence suitable for driving a video display. For example, a flag in the coded bit stream signals whether the following bits are to be decoded with purely a discrete cosine transform (DCT) algorithm or with a prediction algorithm. The header also contains information needed to apply the prediction in algorithm followed by a DCT algorithm. The algorithms comprising the decoding process are regulated by MPEG. MPEG can be applied to exploit common video characteristics such as spatial redundancy, temporal redundancy, uniform motion, spatial masking.
MPEG encodes a video sequence (possible decimated from the original) of, e.g. 720 by 480 pixel frames by 30 frames/s. The images are in color, but are converted to the YUV space, and the two chrominance channels (U and V) are decimated further to 360 by 240 pixels. A coarser resolution in the chrominance channels is acceptable within the bounds of human perception in the decoded reconstructed raw video signal data, at least for xe2x80x9cnaturalxe2x80x9d (not computer generated) images. The basic scheme of MPEG is to predict motion from frame to frame in the temporal direction, and then to use discrete cosine transforms, xe2x80x9cDCTsxe2x80x9d to organize the redundancy in the spatial directions. The DCT""s are done on 8xc3x978 blocks, and the motion prediction is done in the luminance (Y) channel on 16x16 blocks, together forming a so-called macroblock.
The encoded data is organized into a video sequence, which consists of a series of Groups of Pictures, each of some finite length. Each picture is broken down into the a series of slices. Each slice is comprised of a series of adjacent macroblocks. Each macroblock consists of four adjacent 8xc3x978 blocks of data each representing one of four picture element (xe2x80x9cpelxe2x80x9d) values for the Y (luminescence) portion of the video signal (each of the four related to a pixel in the television screen, for example). In addition there are two 8xc3x978 blocks of data, one each for the chrominance values Cb and Cr. Each of the chrominance values is associated with the each of the four Y luminescence values, which relatively are associated with one of the four pixels forming the pel. The six 8xc3x978 blocks of data, therefore, constitute a macroblock. The decoding process utilizes, respectively, frame and filed Inverse Discrete Cosine Transforms (xe2x80x9cIDCTsxe2x80x9d) to decode the respective frame and field Discrete Cosine Transforms (xe2x80x9cDCTsxe2x80x9d) and convert the encoded video signal from the frequency mode to the spatial mode in order to produce the reconstruction raw video signal data.
The DCT coefficients (of either the actual data, or the difference between the block being decoded/reconstructed and another closely matching block from another frame) are xe2x80x9cquantizedxe2x80x9d, to form variations around a much shorter average value. The quantization can change for every xe2x80x9cmacroblock,xe2x80x9d i.e., for each 16x16 block (four 8xc3x978 blocks) of Y and the corresponding two 8xc3x978 blocks for the U(Cb) and V(Cr) for example the quantized values can be 8, 9, 10 or 11 bits in length. The result of all of this, which includes DCT coefficients, motion vectors, and quantization parameters (among other elements) is modified-Huffman coded using fixed tables. The DCT coefficients have a special Huffman table that is xe2x80x9ctwo-dimensional.xe2x80x9d One code specifies a run-length of zeros and a non-zero value at the end of the run. Motion vectors and the DC DCT components are differential pulse code modulation (xe2x80x9cDPCMxe2x80x9d) coded.
Video decoders/reconstructors are known in the art. It is known in the art for such video decoders/reconstructors to have a separate co-processor for doing variable length decoding of the MPEG video data input bit stream, along with a core processor, which does the reconstruction of the decoded MPEG video data into raw video signal data. The reconstructed video signal may be provided to an external device including another host computer or a video player for display.
The speed of processing a macroblock is determined by the amount of data in the macroblock, the speed of the variable length decoder, the speed of the processes performed in doing the reconstruction by the core processor (i.e., among other things, the algorithm used to do the Inverse Discrete Cosine Transformations [xe2x80x9cIDCTsxe2x80x9d] on the DCTs) and the speed of data transfer between the two. For block by block decoding a familiar algorithm involves a two-dimensional IDCT for the 8xc3x978 block which is performed as eight one dimensional IDCTs (xe2x80x9crow operationsxe2x80x9d), one for each of the eight rows followed by eight one dimensional IDCTs (xe2x80x9ccolumn operationsxe2x80x9d) with the result being the IDCT values for each location in the 8xc3x978 block of reconstructed video signal data. A typical algorithm for performing each of the one dimensional row operations or column operations requires 29 additions and 11 multiplications. In some MPEG decoders special purpose microprocessors, referred to as Digital Signal Processors (xe2x80x9cDSPsxe2x80x9d), are equipped with special purpose circuitry, e.g., for doing multiplies and/or divides in specially dedicated hardware (along with the usual arithmetic and logic units that a microprocessor normally possesses). However, for performing MPEG decoding/reconstruction on a general purpose microprocessor/microcontroller and/or using a general purpose microprocessor/microcontroller as a co-processor with, e.g., a VLD co-processor, there exists a need to streamline the processing of the MPEG reconstruction algorithm. One way of reducing the calculations that takes advantage of the MPEG decoded video data structure is the fact that for a row or column that is all zeros, the IDCT output is also all zeros. MPEG encoding is designed to induce as many zero values into the positions within a block of decoded video data as possible. However, one aspect of MPEG encoding tends to eliminate the advantage of the tendency toward having zero DCTs in much of the decoded video data in each block.
For the purpose of so-called xe2x80x9cmismatch controlxe2x80x9d, MPEG encoding adjusts the encoded value of the last column""s last row last column (C(7,7)) DCT value depending upon the overall xe2x80x9coddnessxe2x80x9d of the whole block. In this way, the C(7,7) position in each block is, on average, a one or is converted from a zero value to a one. In this event, the row operation on the bottom (7th) row will produce all non-zero values. Therefore, none of the subsequent column operations can take advantage of the xe2x80x9call-zeroesxe2x80x9d phenomenon, and all will require the full application of the IDCT algorithm, e.g., the 40 computations noted above. There exists in the art, therefore, a need for a more effective and efficient reconstruction algorithm for transforming the decoded video data DCT components into reconstructed raw video signal data with a microprocessor/microcontroller acting as the core processor (or acting concurrently as the VLD co-processor and the core processor). The overall speed of decoding and reconstruction can thus be enhanced.
The present invention utilizes the linear nature of the matrix processing involved in processing an 8xc3x978 block of decoded video data to eliminate the detrimental effect of having non-zero data in the C(7,7) location in a large number of decoded blocks of MPEG decoded video data (DCTs). A dummy matrix is constructed according to whether the matrix contains a non-zero value in the C(7,7) position. The dummy matrix is populated in all block locations with zeroes except for a one in the C(7,7) position. The original matrix having the non-zero value in the C(7,7) position is modified to have a zero in the C(7,7) position. The IDCTs are performed on the modified original matrix, with the bottom row possibly being all zeroes. In the event that it is all zeroes, it does not require any IDCTs to be performed and the transformation of the entire row is also all zeroes. Also, the removal of the one from the C(7,7) position eliminates a transformation that will prevent the column DCTs, or some of them, from being all zeroes, thus allowing for there to be column operations that are in respect to all zero one dimensional columns. For such columns, this eliminates the need to do IDCTs for that column. The dummy matrix, which always remains the same, can be precalculated and stored. The output of the performance of IDCTs on the modified original matrix and the stored result of the IDCTs being performed on the dummy matrix are summed. The sum is the resultant IDCTs for the original unmodified matrix forming the 8xc3x978 block.