The CCITT/ISO committee has standardized a set of compression and decompression algorithms for still and motion digital video. These standards include the JPEG, MPEG and H.261 compression schemes. These standards are commonly applied in video conferencing, CD-ROM based interactive videos for education and entertainment, video or informational kiosks, video on demand (VOD) applications and many other applications which require communication of motion digital video. These standards utilize transform code compressed domain formats, which include the Discrete Cosine transform (DCT), and the interframe predictive code format. Motion Compensation (MC) algorithms are used in conjunction with the DCT format and other hybrid compressed formats.
The MPEG standard was drafted by the Moving Picture Coding Experts Group (MPEG) which operates within the framework of the Joint ISO/IEC Technical Committee (JCCI) on Information Technology. The draft provided a standard for coded representation of moving pictures, audio and their combination. The MPEG standard is intended for equipment supporting continuous transfer rate of up to 1.5 Mbits per second, such as compact disks, digital audio tapes, or magnetic hard disks.
FIG. 1 depicts the steps involved in the MPEG encoding process. As shown in FIG. 1, video data stream 102 is first subjected to motion compensation, represented by block 104, which removes the interframe redundancy from the color motion picture frames. Discrete cosine transformation (DCT), represented by block 106, is then performed on each of the frames to map the spatial luminance or chrominance values into the frequency domain. Next, quantization, represented by block 108, is performed on each 8-by-8 DCT coefficient block (explained below) in accordance with its chrominance or luminance type and its frequency content. This eliminates DCT coefficients below a set threshold. Finally, variable length encoding (VLC), represented by block 110, is performed to compress the video data stream using variable length coding and output encoded video data stream 112.
The 1-dimensional DCT is similar to a 1-dimensional Fourier transform. It transforms a 1-dimensional signal from its original domain--typically time or space domain-to a frequency domain for a time signal or a spatial-frequency domain for a space signal. The transform used in MPEG-1 and MPEG-2 standards, represented by block 106, is a 2-dimensional DCT. A 2-dimensional picture element measuring 8 pixels by 8 pixels is called a block. Performing DCT on an 8-by-8 picture block produces an 8-by-8 DCT coefficients block. It has been shown that due to typical picture statistics, better compression is achieved in the spatial frequency domain than in the spatial domain. The DCT step represented by block 106 does not perform any compression as both the input and the output consists of sixty four pixels. The compression is achieved by the subsequent quantization and variable length coding steps represented by blocks 108 and 110 respectively.
FIG. 2 depicts the steps involved in the MPEG decoding process. Essentially, for MPEG decoding the steps involved in MPEG video encoding are reversed. Thus, video decoding involves variable length entropy decoding (VLD), represented by block 114, followed by dequantization, represented by block 116, followed inverse discrete cosine transformation (IDCT), represented by block 118, and finally motion compensation, represented by block 120. The variable length entropy decoding (VLD) step parses the encoded video stream into symbols using variable length decoding. Dequantization scales the quantized DCT coefficients. The dequantized video data stream is then subjected to inverse discrete cosine transformation. Finally, motion compensation is performed on the video data stream before it is forwarded to a rendering device for display. The output of the MPEG video decode process is decoded video data stream 122 which is then displayed using a video rendering device such as a TV or a RGB monitor.
FIG. 3 depicts a typical computer system 130 used to perform MPEG video decoding. As depicted in FIG. 3, computer system 130 comprises CPU 132, decoder 134 coupled to its local memory 136, graphics controller 138, system memory 144, various peripheral and storage devices 146 and bus interface 140. Bus interface 140 is generally a high speed bus interface such as a PCI or AGP bus interface and provides communication means between CPU 132, decoder 134, graphics controller 138, system memory 144 and storage devices 146. Peripheral and storage devices 146 include hard disks, CD drives or DVDs and other similar commercially available multimedia storage devices. Graphics controller 138 is coupled to tuner 142 and provides 2-D and 3-D graphics functionality as well as video scaling and color space conversion. The output from graphics controller 138 is decoded video data stream 148 when can then be relayed to a video rendering device such as a TV or a RGB monitor for display. MPEG video decode tasks depicted in FIG. 2 are performed either by software executed by CPU 132 or by decoder 134 hardware. In sophisticated computer systems, video decode tasks can also be split between CPU 132 and decoder 134.
Appendix A recites an algorithm used to perform the dequantization step depicted by block 116 in FIG. 2. This algorithm performs DCT coefficients reconstruction as described in the ISO 11172-2 (MPEG-1 Video) Specification. The algorithm consists of operations which are performed on each pixel of an 8-by-8 DCT block. Four 8-by-8 DCT blocks constitute to form a 16-by-16 macroblock which is the smallest encode/decode unit utilized in the MPEG standard. Thus, The DCT coefficients for each block present in the macroblock can be dequantized by steps equivalent to those recited in Appendix A. The algorithm stated in Appendix A broken down into five logical sequential steps as depicted in the flow chart shown in FIG. 4. Assuming that the numerical values are represented as fixed-point, two's complement binary numbers, each step of the algorithm is described below.
Step 1 (Lines 6-7)
Line 6a: dct.sub.-- recon[m][n]=(((2* dct.sub.-- zz[i])+Sign(dct.sub.-- zz[i])) PA0 Line 7a: * quantizer.sub.-- scale * quant[m][n])/16; PA0 Line 6b: dct.sub.-- recon[m][n]=(((2* dct.sub.-- zz[i]) PA0 Line 7b: * quantizer.sub.-- scale * quant[m][n])/16;
As shown by block 152 of FIG. 4, in this step each pixel coefficient, which could be part of a non-intracoded block (6a-7a) or an intra-coded block (6b-7b), is scaled by a factor of two. For non-intracoded blocks, (Sign()) is added to it. The Sign() function is defined as follows:
______________________________________ Sign(x) = 1, for (x &gt; 0) = 0, for (x = 0) = -1, for (x &lt; 0) (Appendix A: Line 6) ______________________________________
Thus, this step involves one multiplication and one addition/subtraction arithmetic operation.
Step 2: (Line 7)
As shown by block 154 in FIG. 4, in this step the pixel coefficients are multiplied by a global scaling factor "quantizer.sub.-- scale" and by a quantization matrix entry "quant[m][n]". "Quantizer.sub.-- scale" is a global quantization scaler for the entire 8-by-8 block. "quant[m][n]" is a 2-dimensional 8-by-8 quantization matrix (for non-intracoded block or intracoded block) with individual scaling factors for each coefficient. Thus, this step involves one multiplication arithmetic operation.
Step 3: (Line 7)
This step involves dividing the pixel coefficients by 16 with truncation towards zero. This is represented by block 156, 158 and 160 in FIG. 4. First, represented by block 156, a determination is made to ascertain whether the input value of the pixel coefficient is negative with a non-zero fraction. If so, the value is truncated towards zero by first adding the "truncation correction factor" (TCF) having value 15 to it, represented by block 158, and then dividing the result by 16, represented by block 160. If on the other hand the value is positive, or negative with no fractional part, a simple division by 16, represented by block 160 is performed. Thus, this step involves one comparison, one addition and one division arithmetic operation.
Step 4: Lines
______________________________________ Line if ((dct.sub.-- recon[m][n] & 1) == 0) //Oddification step 8: Sign(dct.sub.-- recon[m][n]);= dct.sub.-- recon[m][n] 9: ______________________________________
This step involves performing oddification towards 0 as represented by blocks 162, 164, 166 and 168 in FIG. 4. First, represented by block 162 it is determined whether the resultant pixel coefficient value after Step 3 is odd or zero. If the value is odd or zero, the algorithm proceeds to block 170. However, if the value is even and non-zero, the value is further tested, represented by block 164, for being greater than 0. If the value is positive, then a "oddification correction factor" (OCF) which has a value of 1 is subtracted from the coefficient value, represented by block 168. If value is negative and non-zero, the OCF is added to the coefficient value, as represented by block 166. All odd and zero values remain unchanged. Thus, this step involves one addition or subtraction arithmetic operation.
Step 5: Lines
______________________________________ Line 10: if (dct.sub.-- recon[m][n] &gt; 2047) Line 11: dct.sub.-- recon[m][n] = 2047; Line 12: if (dct.sub.-- recon[m][n] &lt; -2048) Line 13: dct.sub.-- recon[m][n] = -2048; Line 14: if (dct.sub.-- zz[i] == 0) Line 15: dct.sub.-- recon[m][n] = 0; ______________________________________
The final step of the algorithm involves the process of saturation which imposes a limiter on the value of the coefficient. It does not allow the coefficient value to be larger than 2047 or smaller than -2048, thus effectively limiting the coefficient to be a 12-bit, two's-complement signed number. In lines 10-11, saturation is performed on the positive value limit, while in lines 12-13, saturation is performed on the negative value limit. Lines 14-15 ensure that if the input value is zero, the output value is also zero. Thus, this step involves three comparisons and no arithmetic operations.
FIG. 5 depicts a prior art decoder apparatus 171 for implementing the steps described in Appendix A. As shown in FIG. 5, decoder apparatus 171 comprises of prescaler apparatus 172, multiplier apparatus 174, divider/truncator apparatus 175, oddification apparatus 179 and saturation apparatus 182. Pre-scaler 172 performs operations corresponding to Step 1, which are scaling by a factor of 2 and adding the sign. Next, multiplier apparatus 174 receives three inputs-output from Step 1, the "quantizer.sub.-- scale" and the "quant[m][n]" and multiplies the three inputs (corresponding to Step 2). Divider/truncator apparatus 175 is responsible for performing operations corresponding to Step 3 and comprises of adder 176, shifter 178 and truncation correction factor (TCF) generator 177. TCF generator 177 generates the appropriate truncation correction factor which is then fed to adder 176 which adds the truncation correction factor to the result from the previous Step 2. The truncation correction factor has a value of 15 if the coefficient is negative with a non-zero fraction, and a zero value otherwise. Shifter 178 performs the division by 16 by performing a 4-bit right arithmetic shift (corresponding to block 160 in FIG. 4). The result from shifter 178 is then fed to oddification apparatus 179 which performs operations corresponding to Step 4. Oddification apparatus 179 comprises oddification correction factor (OCF) generator 181 and adder 180. Oddification correction factor (OCF) generator 181 generates the appropriate oddification correction factor which is then fed to adder 180 which adds or subtracts the oddification correction factor from the result of previous step 3. Finally, saturation apparatus 182 receives the resultant pixel coefficient from Step 4 and performs saturation operations corresponding to Step 5. Saturation apparatus 182 imposes a limiter on the coefficient value and can be a simple hardware selector or demultiplexer. The block output is one of three values, +2047 if input value is greater than 2047, -2048 if input value is less than -2048, and equal to the input value in all other cases.
As mentioned earlier, DCT coefficient reconstruction is an integral part of the MPEG video decode process and is performed on every pixel of each block of the video data stream. Thus, the amount of time and compute resources required to decode the entire MPEG encoded video stream is directly proportional to the time and resources required for each step of the reconstruction algorithm. In order to increase the efficiency of the video decode process, it is desirable to accomplish the decode process using a reduced number of computations, translating to savings in time required for video decode. It is also desirable to reduce the complexity of the decoder so that it is cheaper and occupies less hardware real estate--thus reducing the cost of the video decoder and the overall video decode process.