1. Field of the Invention
The present invention relates generally to systems and methods for performing discrete cosine transform (DCT) and inverse discrete cosine transform (DCT) operations. The invention also relates to digital video compression and decompression, and more particularly to a video encoder and decoder for performing the discrete cosine transform and/or inverse discrete cosine transform with improved efficiency and reduced computational requirements.
2. Description of the Related Art
Full-motion digital video requires a large amount of storage and data transfer bandwidth. Thus, video systems use various types of video compression algorithms to reduce the amount of necessary storage and transfer bandwidth. In general, different video compression methods exist for still graphic images and for full-motion video. Intraframe compression methods are used to compress data within a still image or single frame using spatial redundancies within the frame. Interframe compression methods are used to compress multiple frames, i.e., motion video, using the temporal redundancy between the frames. Interframe compression methods are used exclusively for motion video, either alone or in conjunction with intraframe compression methods.
Intraframe or still image compression techniques generally use frequency domain techniques, such as the discrete cosine transform (DCT). Intraframe compression typically uses the frequency characteristics of a picture frame to efficiently encode a frame and remove spatial redundancy. Examples of video data compression for still graphic images are JPEG (Joint Photographic Experts Group) compression and RLE (run-length encoding). JPEG compression is a group of related standards that use the discrete cosine transform (DCT) to provide either lossless (no image quality degradation) or lossy (imperceptible to severe degradation) compression. Although JPEG compression was originally designed for the compression of still images rather than video, JPEG compression is used in some motion video applications. The RLE compression method operates by testing for duplicated pixels in a single line of the bit map and storing the number of consecutive duplicate pixels rather than the data for the pixels themselves.
In contrast to compression algorithms for still images, most video compression algorithms are designed to compress full motion video. As mentioned above, video compression algorithms for motion video use a concept referred to as interframe compression to remove temporal redundancies between frames. Interframe compression involves storing only the differences between successive frames in the data file. Interframe compression stores the entire image of a key frame or reference frame, generally in a moderately compressed format. Successive frames are compared with the key frame, and only the differences between the key frame and the successive frames are stored. Periodically, such as when new scenes are displayed, new key frames are stored, and subsequent comparisons begin from this new reference point. The difference frames are further compressed by such techniques as the DCT. Examples of video compression which use an interframe compression technique are MPEG, DVI and Indeo, among others.
A compression standard referred to as MPEG (Moving Pictures Experts Group) compression is a set of methods for compression and decompression of full motion video images which uses the interframe and intraframe compression techniques described above. MPEG compression uses both motion compensation and discrete cosine transform (DCT) processes, among others, and can yield compression ratios of more than 200:1.
The two predominant MPEG standards are referred to as MPEG-1 and MPEG-2. The MPEG-1 standard generally concerns inter-frame data reduction using block-based motion compensation prediction (MCP), which typically uses temporal differential pulse code modulation (DPCM). The MPEG-2 standard is similar to the MPEG-1 standard, but includes extensions to cover a wider range of applications, including interlaced digital video such as high definition television (HDTV).
Interframe compression methods such as MPEG are based on the fact that, in most video sequences, the background remains relatively stable while action takes place in the foreground. The background may move, but large portions of successive frames in a video sequence are redundant. MPEG compression uses this inherent redundancy to encode or compress frames in the sequence.
An MPEG stream includes three types of pictures, referred to as the Intra (I) frame, the Predicted (P) frame, and the Bidirectional Interpolated (B) frame. The I or Intraframes contain the video data for the entire frame of video and are typically placed every 10 to 15 fames. Intrafames provide entry points into the file for random access, and are generally only moderately compressed. Predicted frames are encoded with reference to a past frame, i.e., a prior Intraframe or Predicted frame. Thus P frames only include changes relative to prior I or P frames. In general, Predicted frames receive a fairly high amount of compression and are used as references for future Predicted frames. Thus, both I and P frames are used as references for subsequent frames. Bi-directional pictures include the greatest amount of compression and require both a past and a future reference in order to be encoded. Bi-directional frames are never used as references for other frames.
In general, for the frame(s) following a reference frame, i.e., P and B frames that follow a reference I or P frame, only small portions of these frames are different from the corresponding portions of the respective reference frame. Thus, for these frames, only the differences are compressed and stored. The differences between these frames are typically generated using motion vector estimation logic, as discussed below.
When an MPEG encoder receives a video file or bitstream, the MPEG encoder generally first creates the I frames. The MPEG encoder may compress the I frame using an intraframe compression technique. After the I frames have been created, the MPEG encoder divides respective frames into a grid of 16xc3x9716 pixel squares called macroblocks. The respective frames are divided into macroblocks in order to perform motion estimation/compensation. Thus, for a respective target picture or frame, i.e., a frame being encoded, the encoder searches for an exact, or near exact, match between the target picture macroblock and a block in a neighboring picture referred to as a search frame. For a target P frame the encoder searches in a prior I or P frame. For a target B frame, the encoder searches in a prior or subsequent I or P frame. When a match is found, the encoder transmits a vector movement code or motion vector. The vector movement code or motion vector only includes information on the difference between the search frame and the respective target picture. The blocks in target pictures that have no change relative to the block in the reference picture or I frame are ignored. Thus the amount of data that is actually stored for these frames is significantly reduced.
After motion vectors have been generated, the encoder then encodes the changes using spatial redundancy. Thus, after finding the changes in location of the macroblocks, the MPEG algorithm further calculates and encodes the difference between corresponding macroblocks. Each macroblock is comprised of four subblocks, of size 8xc3x978 each, for brightness or luminance signal and corresponding two, four or eight subblocks for color or chrominance signal depending on color formats. Encoding the difference is accomplished through a math process referred to as the discrete cosine transform or DCT. This process operates on each 8xc3x978 block.
For frames which are used as references for other frames, the MPEG encoder is required to reverse the quantization and DCT transform on these blocks in order to recover the resultant pixel data. This resultant pixel data is used for motion estimation on subsequent frames, such as P and B frames. Thus MPEG encoders generally include inverse quantization logic as well as inverse DCT logic.
Therefore, MPEG compression is based on two types of redundancies in video sequences, these being spatial, which is the redundancy in an individual frame, and temporal, which is the redundancy between consecutive frames. Spatial compression is achieved by considering the frequency characteristics of a picture frame. Each frame is divided into non-overlapping blocks, and each block is transformed via the discrete cosine transform (DCT). After the transformed blocks are converted to the xe2x80x9cDCT domainxe2x80x9d, each entry in the transformed block is quantized with respect to a set of quantization tables. The quantization step for each entry can vary, taking into account the sensitivity of the human visual system (HVS) to the frequency. Since the HVS is more sensitive to low frequencies, most of the high frequency entries are quantized to zero. In this step where the entries are quantized, information is lost and errors are introduced to the reconstructed image. Run length encoding is used to transmit the quantized values. To further enhance compression, the blocks are scanned in a zig-zag ordering that scans the lower frequency entries first, and the non-zero quantized values, along with the zero run lengths, are entropy encoded.
As discussed above, temporal compression makes use of the fact that most of the objects remain the same between consecutive picture frames, and the difference between objects or blocks in successive frames is their position in the frame as a result of motion (either due to object motion, camera motion or both). This relative encoding is achieved by the process of motion estimation. The difference image as a result of motion compensation is further compressed by means of the DCT, quantization and RLE entropy coding.
When an MPEG decoder receives an encoded stream, the MPEG decoder reverses the above operations. Thus the MPEG decoder performs inverse scanning to remove the zig zag ordering, inverse quantization to de-quantize the data, and the inverse DCT to convert the data from the frequency domain back to the pixel domain. The MPEG decoder also performs motion compensation using the transmitted motion vectors to re-create the temporally compressed frames.
Computation of the discrete cosine transform (DCT) as well as computation of the inverse discrete cosine transform (IDCN in video systems generally require a large amount of processing. For example, hundreds of multiplication (or division) operations as well as hundreds of addition (or subtraction) operations may be required to perform the DCT or DCT upon a single 8xc3x978 array. Such computational requirements can be extremely time-consuming and resource intensive.
A new system and method are desired for efficiently computing the forward and/or inverse discrete cosine transform. It is particularly desirable to provide a system for computing the forward and/or inverse discrete cosine transform which reduces computational requirements in a video system.
The problems outlined above are in large part solved by a system and method of a forward and/or inverse discrete cosine transform in accordance with the present invention. In one embodiment, an array of DCT transform coefficients are converted to a two dimensional array of spatial data. The array of DCT transform coefficients are first operated upon by a pre-scale computation unit (implemented in either hardware or software) which multiplies a set of predetermined pre-scale constants with the input coefficients. The pre-scale constants multiplied by the input DCT coefficient matrix form a symmetric pre-scale array. Upon pre-scaling using the symmetric pre-scale factor array, an intermediary array is composed by performing intermediary calculations upon each column vector of the pre-scaled array. The output of this intermediary calculation is composed to form an intermediary array. Subsequently, a set of calculations are performed row-wise upon each row vector of the intermediary array to thereby form the output array of spatial data.
In one implementation, the array of pre-scale coefficients (i.e., the symetric pre-scale factor array) may be represented as the result of a matrix multiplication Mxc3x97Uxc3x97M, where the array U consists of a set of coefficients all equal to 1, and wherein the array M is an array of coefficients wherein the equal row-column diagonal of coefficients include a set of pre-scale constants and all other coefficients in the array are equal to 0. The pre-scale constants includes a set of constants expressed by the cos(xcfx80/16), where n=1, 2, 3 and 4.
In one embodiment, the intermediary calculation performed upon each column vector of the pre-scaled array includes:
v0=b0+b4;
v4=b0xe2x88x92b4;
v2=b6*tan(xcfx80*{fraction (2/16)})+b2;
v6=b6xe2x88x92b2*tan(xcfx80*{fraction (2/16)});
v7=b1*tan(xcfx80*{fraction (1/16)})xe2x88x92b7;
v1=b1+b7*tan(xcfx80*{fraction (1/16)});
v5=xe2x88x92b3*tan(xcfx80*{fraction (3/16)})+b5;
v3=b3+b5*tan(xcfx80*{fraction (3/16)});
b0=v0+v2;
b2=v0xe2x88x92v2;
b4=v4+v6;
b6=v4xe2x88x92v6;
b3=v7+v5;
b5=(v7xe2x88x92v5);
b1=(v1xe2x88x92v3);
b7=v1+v3;
v5=(b1+b5)*cos(xcfx80*{fraction (4/16)});
xe2x80x83v1=(b1xe2x88x92b5)*cos(xcfx80*{fraction (4/16)});
output[0]=(b0+b7);
output[7]=(b0xe2x88x92b7);
output[1]=(b6+v5);
output[6]=(b6xe2x88x92v5);
output[2]=(b4+v1);
output[5]=(b4xe2x88x92v1);
output[3]=(b2+b3);
output[4]=(b2xe2x88x92b3);
wherein the input parameters b0-b7 represent the coefficients of each column vector of said pre-scaled array. Upon composition of an intermediary array by applying the operator expressed above upon each column vector of the pre-scaled array, the output array is calculated by applying the operator row-wise upon each row vector of the intermediary array. In another embodiment, the operator expressed above is first applied row-wise upon each row vector of the pre-scaled array to form an intermediary array. Subsequently, the operator is applied column wise upon each column vector of the intermediary array.
Since pre-scaling of the DCT input coefficients is performed using a symmetric pre-scale factor array before performing the column-wise and row-wise calculations, fewer overall multiplications are required since separate column-wise and row-wise pre-scale calculations are not performed. Accordingly, when employed within a video compression or decompression system, the inverse discrete cosine transform may be performed more efficiently and faster.
In another embodiment, the calculations are reversed to perform a forward discrete cosine transform operation. In such an embodiment, post-scaling upon an array of coefficients calculated using column-wise and row-wise calculations is performed. Again, since the overall number of calculations for performing the discrete cosine transform may be reduced, faster and more efficient video compression and decompression systems may be attained.
The fast forward or inverse discrete cosine transform methodology may be employed within a computation unit of a video encoder or decoder system, either in hardware or software. The DCT coefficients and resulting spatial data may be stored within a memory of the video encoder or decoder system. A video encoder or decoder employing the fast forward or inverse discrete cosine transform methodology in accordance with the present invention may advantageously achieve high performance.