1. The Field of the Invention
The present invention relates to the field of video processing. In particular, the present invention relates to the field of reducing generational error caused by requantization of a predictive video stream using motion compensation.
2. Background and Relevant Art
Video constitutes a series of images that, when displayed above a certain rate, gives the illusion to a human viewer that the image is moving. Video is now a widespread medium for communicating information whether it be a television broadcast, a taped program, or the like. More recently, digital video has become popular.
An uncompressed digital video stream has high bandwidth and storage requirements. For example, the raw storage requirement for uncompressed CCIR-601 resolution 4:2:2: serial digital video is approximately 20 megabytes per second. In addition, associated audio and data channels also require bandwidth and storage. From a transmission bandwidth perspective, 20 megabytes per second is much faster than conventional transmission techniques can practicably support. In addition, from a storage perspective, a two-hour movie would occupy approximately 144 Gigabytes of memory, well above the capabilities of a conventional Digital Versatile Disk (DVD). Therefore, what were desired were systems and methods for compressing (or coding) digital video in a way that maintains a relatively high degree of fidelity with the original video once uncompressed (or decoded).
One conventional high-quality compression standard is called MPEG-2, which is based on the principle that there is a large degree of visual redundancy in video streams. By removing much of the redundant information, the video storage and bandwidth requirements are significantly reduced.
FIG. 1A illustrates a display order 100A of a sequence of pictures. If the video stream represents progressive video, the pictures represent individual progressive frames. If the video steam represents interlaced video, the pictures represent individual interlaced frames containing two fields each.
Under the MPEG-2 standard, there are three classes of pictures, I-pictures, P-pictures and B-pictures. While MPEG-2 allows for a number of display orders, the display order illustrated in FIG. 1A is commonly used. In this common display order, there are a series of I-pictures. For clarity, only I-pictures I1 and I16 are shown in FIG. 1A. Each consecutive I-picture pair has four P-pictures interspersed there between. For example, P-pictures P4, P7, P10 and P13 are interspersed between consecutive I-pictures I1 and I16. In addition, two B-pictures are interspersed between each I-picture and each of its neighboring P-pictures. Two B-pictures are also interspersed between each consecutive P-picture pair. For example, B-pictures B2 and B3 are interspersed between I-picture I1 and P-picture B4, B-pictures B5 and B6 are interspersed between P-pictures P4 and P7, B-pictures B8 and B9 are interspersed between P-pictures P7 and P10, B-pictures B11, and B12 are interspersed between P-pictures P10 and P13, and B-pictures B14 and B15 are interspersed between P-picture P13 and I-picture I16.
The I-pictures are xe2x80x9cintra-codedxe2x80x9d meaning that they can be restructured without reference to any other picture in the video stream.
The P-pictures are xe2x80x9cinter-codedxe2x80x9d meaning that they may only be restructured with reference to another reference picture. Typically, the P-picture may include motion vectors that represent estimated motion with respect to the reference picture. The P-picture may be reconstructed using the immediately preceding I-picture or P-picture as a reference. In FIG. 1A, arrows illustrate the predictive relationship between pictures wherein the picture at the head of the arrow indicates the predictive picture, and the picture at the tail of the arrow indicates the reference picture used to reconstruct the predictive picture. For example, the reconstruction of P-picture P7 uses P-picture P4 as a reference.
B-pictures are also inter-coded. The B-picture is typically reconstructed using the immediately preceding I-picture or P-picture as a reference, and the immediately subsequent I-picture or P-picture as a reference. For example, the reconstruction of B-picture B14 uses P-picture P13 and I-picture I16 as references.
FIG. 1B illustrates the decode order 100B of the pictures. The decode order is similar to the display order except that reference frames are decoded prior to any predictive pictures that rely on the reference picture, even if the reference picture is displayed after the predictive picture. Thus, the arrows in FIG. 1B are all rightward facing.
FIG. 2A illustrates the general process involved with encoding a digital picture 201 using an encoder 200A that is compatible with the MPEG-2 standard. If the digital picture is to be an I-picture, the digital picture bypasses the motion estimator 202 and is provided to the discrete cosine transformation unit (DCT) 203, which transforms the digital picture, on a block-by-block basis, from a spatial representation of an image to a frequency representation of the image. The frequency representation is then passed to a quantization unit 204, which quantizes each frequency, on a macroblock-by-macroblock basis, into definable ranges. A xe2x80x9cmacroblockxe2x80x9d is a 16-pixel by 16-pixel array within the picture. The quantized image is then passed to a variable length coder 205 which performs, for example, variable length Huffman coding on the resulting quantized image. The reduced sized I-picture is then stored or transmitted for subsequent decoding.
If the digital picture 201 is to be a P-picture, the encoding process is similar as for I-pictures with several notable exceptions. If a P-picture, the digital picture is passed first to the motion estimator 202. For each macroblock (i.e., 16xc3x9716 pixel array) in the P-picture, the motion estimator 202 finds a close visual match to the macroblock in the reference picture. The motion estimator 202 then represents the macroblock in the P-picture as a motion vector representing the motion between the macroblock in the P-picture and the close visual match 16xc3x9716 pixel array in the reference picture. In addition to the motion vector, a difference macroblock is calculated representing the difference between the macroblock in the P-picture and the close match 16xc3x9716 pixel array in the reference frame. A macroblock represented as a difference with corresponding motion vectors is typically smaller than a macroblock represented without motion vectors. Discrete cosine transformation and quantization are then performed on just the difference representation of the P-picture. Then, the difference information is combined with the motion vectors before variable length coding is performed.
B-pictures are encoded similar to how P-pictures are encoded, except that motion may be estimated with reference to a prior reference picture and a subsequent reference picture.
FIG. 2B illustrates a conventional decoder 200B in conformance with the MPEG-2 standard. First, a variable length decoder 215 performs, for example, variable length decoding on the picture. The picture (or the difference data of the picture if a P-picture or a B-picture) is passed to the inverse quantizor 214 for inverse quantization on a macroblock-by-macroblock basis. Next, an inverse discrete cosine transformer 213 performs inverse discrete cosine transformation on the frequency representation of the picture, on a block-by-block basis, to reconstruct the spatial representation of the picture. The spatial representation of the picture is passed to the motion compensator 212 where the spatial representation is combined with the motion vectors (if a P-picture or B-picture) to thereby reconstruct the digital picture 201xe2x80x2. The reconstructed digital picture 201xe2x80x2 is labeled differently than the original picture 201 to represent that there may be some loss in the encoding process.
In this manner, MPEG-2 combines the functionality of motion compensation, discrete cosine transformation, quantization, and variable length coding to significantly reduce the size of a video stream with some generally acceptable reduction in video quality. Despite conventional standards such as MPEG-2 that provide significant compression to a video stream, it is desirable to reduce the bandwidth requirements of the video stream even more to maximize network and storage performance.
One known method for reducing the bandwidth requirements even further is to perform variable length decoding on the video stream, perform inverse quantization, then perform requantization at a coarser scale, and then perform variable length encoding. The requantized values require fewer bits to represent than the originally quantized values. Thus, the video stream bandwidth requirements are reduced.
While requantization results in decreased bandwidth requirements, requantization also results in some lost data thereby decreasing the quality of the corresponding picture to some extent. This loss in quality is compounded in predictive video streams since a predictive picture may rely on a chain of reference pictures, each subject to requantization error. For example, P-picture P13 is predicted from P-picture P10. If P-picture P10 has inaccuracies, these inaccuracies will be propagated to P-picture P13. In addition, P-picture P10 is predicted from P-picture P7, P-picture P7 is predicted from P-picture P4, and P-picture P4 is predicted from I-picture I1. Error introduced to any of these pictures will propagate down the entire predictive chain. Thus, P-picture P13 may include error propagated from I-picture I1 and P-pictures P4, P7 and P10. This propagated error is called xe2x80x9cgenerationalxe2x80x9d error.
While the incremental error introduced by a single requantization to a single picture may be unperceivable to a human viewer, when the requantization error is compounded due to these generational effects, there can be a significant loss in quality. Therefore, what are desired are systems and methods for reducing generational error caused by requantization of a predictive video stream.
The present invention extends to both methods and systems for at least partially avoiding generational error due to requantization of video streams that use predictive inter-picture motion compensation. For example, as described above, the MPEG-2 standard developed by the Moving Pictures Experts Group defines three classes of pictures; I-pictures, P-pictures, and B-pictures. The I-pictures may be decoded without reference to any other pictures. P-pictures use a previous I-picture or P-picture as a reference frame during decoding. B-pictures may use a previous and possibly a subsequent I-picture or P-picture during decoding. The requantization may be adaptively performed based on a variety of factors such as the type of picture (e.g., I, P or B), the current network traffic of the network over which the potentially requantized picture is to traverse, and the storage availability for storing error pictures used to compensate for requantization error.
First, a reference picture that has been previously quantized is accessed and decoded. The reference picture is requantized using, for any given macroblock, a coarser quantization scale than what was used during the original encode process. Then, the requantized reference picture is decoded. Next, an error picture is calculated and stored. The error picture represents the difference between the reference picture as decoded without requantization and the reference picture as decoded with requantization.
Next, a predictive picture is accessed that is predicted from the reference picture. The predictive picture includes a number of components such as 16-pixel by 16-pixel macroblocks. Each macroblock includes one or more motion vectors representing estimated motion between the macroblock and a visually similar component piece in the reference picture. Each macroblock also includes a motion compensated residual component representing the difference between the macroblock in the predictive picture and the visually similar component piece in the reference picture.
For each motion compensated residual component, the motion vectors are used to extract the portion of the stored error reference picture that corresponds to the visually similar component in the reference picture. Then, the motion compensated residual component is added to the extracted portion of the stored error picture to thereby form an altered predictive picture that represents a closer approximation of the predictive picture as it would appear after decoding if the reference picture had not been requantized. Thus, the altered predictive picture has a reduced level of requantization-induced generational error.
This compensation for error may be adaptively performed. If it is determined that the storage available is less than that required for allocating a buffer that can store an error picture, then the error picture is compressed and stored, or not stored at all. In addition, whether requantization is performed, and at what level, may also be adaptively determined. Thus, the principles of the present invention allow for adaptive compensation for requantization error to thereby maximize system performance.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.