1. The Field of the Invention
The present invention relates to the field of video processing. In particular, the present invention relates to the field of reducing generational error caused by requantization of a predictive video stream using motion compensation.
2. Background and Relevant Art
Video constitutes a series of images that, when displayed above a certain rate, gives the illusion to a human viewer that the image is moving. Video is now a widespread medium for communicating information whether it be a television broadcast, a taped program, or the like. More recently, digital video has become popular.
An uncompressed digital video stream has high bandwidth and storage requirements. For example, the raw storage requirement for uncompressed CCIR-601 resolution 4:2:2: serial digital video is approximately 20 megabytes per second. In addition, associated audio and data channels also require bandwidth and storage. From a transmission bandwidth perspective, 20 megabytes per second is much faster than conventional transmission techniques can practicably support. In addition, from a storage perspective, a two-hour movie would occupy approximately 144 Gigabytes of memory, well above the capabilities of a conventional Digital Versatile Disk (DVD). Therefore, what were desired were systems and methods for compressing (or coding) digital video in a way that maintains a relatively high degree of fidelity with the original video once uncompressed (or decoded).
One conventional high-quality compression standard is called MPEG-2, which is based on the principle that there is a large degree of visual redundancy in video streams. By removing much of the redundant information, the video storage and bandwidth requirements are significantly reduced.
FIG. 1A illustrates a display order 100A of a sequence of pictures. If the video stream represents progressive video, the pictures represent individual progressive frames. If the video stream represents interlaced video, the pictures represent individual interlaced frames containing two fields each.
Under the MPEG-2 standard, there are three classes of pictures, I-pictures, P-pictures and B-pictures. While MPEG-2 allows for a number of display orders, the display order illustrated in FIG. 1A is commonly used. In this common display order, there are a series of I-pictures. For clarity, only I-pictures I1 and I16 are shown in FIG. 1A. Each consecutive I-picture pair has four P-pictures interspersed there between. For example, P-pictures P4, P7, P10 and P13 are interspersed between consecutive I-pictures I1 and I16. In addition, two B-pictures are interspersed between each I-picture and each of its neighboring P-pictures. Two B-pictures are also interspersed between each consecutive P-picture pair. For example, B-pictures B2 and B3 are interspersed between I-picture I1 and P-picture B4, B-pictures B5 and B6 are interspersed between P-pictures P4 and P7, B-pictures B8 and B9 are interspersed between P-pictures P7 and P10, B-pictures B11 and B12 are interspersed between P-pictures P10 and P13, and B-pictures B14 and B15 are interspersed between P-picture P13 and I-picture I16.
The I-pictures are “intra-coded” meaning that they can be restructured without reference to any other picture in the video stream.
The P-pictures are “inter-coded” meaning that they may only be restructured with reference to another reference picture. Typically, the P-picture may include motion vectors that represent estimated motion with respect to the reference picture. The P-picture may be reconstructed using the immediately preceding I-picture or P-picture as a reference. In FIG. 1A, arrows illustrate the predictive relationship between pictures wherein the picture at the head of the arrow indicates the predictive picture, and the picture at the tail of the arrow indicates the reference picture used to reconstruct the predictive picture. For example, the reconstruction of P-picture P7 uses P-picture P4 as a reference.
B-pictures are also inter-coded. The B-picture is typically reconstructed using the immediately preceding I-picture or P-picture as a reference, and the immediately subsequent I-picture or P-picture as a reference. For example, the reconstruction of B-picture B14 uses P-picture P13 and I-picture I16 as references.
FIG. 1B illustrates the decode order 100B of the pictures. The decode order is similar to the display order except that reference frames are decoded prior to any predictive pictures that rely on the reference picture, even if the reference picture is displayed after the predictive picture. Thus, the arrows in FIG. 1B are all rightward facing.
FIG. 2A illustrates the general process involved with encoding a digital picture 201 using an encoder 200A that is compatible with the MPEG-2 standard. If the digital picture is to be an I-picture, the digital picture bypasses the motion estimator 202 and is provided to the discrete cosine transformation unit (DCT) 203, which transforms the digital picture, on a block-by-block basis, from a spatial representation of an image to a frequency representation of the image. The frequency representation is then passed to a quantization unit 204, which quantizes each frequency, on a macroblock-by-macroblock basis, into definable ranges. A “macroblock” is a 16-pixel by 16-pixel array within the picture. The quantized image is then passed to a variable length coder 205 which performs, for example, variable length Huffman coding on the resulting quantized image. The reduced sized I-picture is then stored or transmitted for subsequent decoding.
If the digital picture 201 is to be a P-picture, the encoding process is similar as for I-pictures with several notable exceptions. If a P-picture, the digital picture is passed first to the motion estimator 202. For each macroblock (i.e., 16×16 pixel array) in the P-picture, the motion estimator 202 finds a close visual match to the macroblock in the reference picture. The motion estimator 202 then represents the macroblock in the P-picture as a motion vector representing the motion between the macroblock in the P-picture and the close visual match 16×16 pixel array in the reference picture. In addition to the motion vector, a difference macroblock is calculated representing the difference between the macroblock in the P-picture and the close match 16×16 pixel array in the reference frame. A macroblock represented as a difference with corresponding motion vectors is typically smaller than a macroblock represented without motion vectors. Discrete cosine transformation and quantization are then performed on just the difference representation of the P-picture. Then, the difference information is combined with the motion vectors before variable length coding is performed.
B-pictures are encoded similar to how P-pictures are encoded, except that motion may be estimated with reference to a prior reference picture and a subsequent reference picture.
FIG. 2B illustrates a conventional decoder 200B in conformance with the MPEG-2 standard. First, a variable length decoder 215 performs, for example, variable length decoding on the picture. The picture (or the difference data of the picture if a P-picture or a B-picture) is passed to the inverse quantizor 214 for inverse quantization on a macroblock-by-macroblock basis. Next, an inverse discrete cosine transformer 213 performs inverse discrete cosine transformation on the frequency representation of the picture, on a block-by-block basis, to reconstruct the spatial representation of the picture. The spatial representation of the picture is passed to the motion compensator 212 where the spatial representation is combined with the motion vectors (if a P-picture or B-picture) to thereby reconstruct the digital picture 201′. The reconstructed digital picture 201′ is labeled differently than the original picture 201 to represent that there may be some loss in the encoding process.
In this manner, MPEG-2 combines the functionality of motion compensation, discrete cosine transformation, quantization, and variable length coding to significantly reduce the size of a video stream with some generally acceptable reduction in video quality. Despite conventional standards such as MPEG-2 that provide significant compression to a video stream, it is desirable to reduce the bandwidth requirements of the video stream even more to maximize network and storage performance.
One known method for reducing the bandwidth requirements even further is to perform variable length decoding on the video stream, perform inverse quantization, then perform requantization at a coarser scale, and then perform variable length encoding. The requantized values require fewer bits to represent than the originally quantized values. Thus, the video stream bandwidth requirements are reduced.
While requantization results in decreased bandwidth requirements, requantization also results in some lost data thereby decreasing the quality of the corresponding picture to some extent. This loss in quality is compounded in predictive video streams since a predictive picture may rely on a chain of reference pictures, each subject to requantization error. For example, P-picture P13 is predicted from P-picture P10. If P-picture P10 has inaccuracies, these inaccuracies will be propagated to P-picture P13. In addition, P-picture P10 is predicted from P-picture P7, P-picture P7 is predicted from P-picture P4, and P-picture P4 is predicted from I-picture I1. Error introduced to any of these pictures will propagate down the entire predictive chain. Thus, P-picture P13 may include error propagated from I-picture I1 and P-pictures P4, P7 and P10. This propagated error is called “generational” error.
While the incremental error introduced by a single requantization to a single picture may be unperceivable to a human viewer, when the requantization error is compounded due to these generational effects, there can be a significant loss in quality. Therefore, what are desired are systems and methods for reducing generational error caused by requantization of a predictive video stream.