1. Field of the Invention
The present invention relates generally to video encoding and decoding and, in particular, to methods and apparatus for error concealment in video encoding and decoding.
2. Description of the Related Art
Advances in audio and video compression and decompression techniques, together with very large scale integration technology, have enabled the creation of new capabilities and markets. These include the storage of digital audio and video in computers and on small optical discs as well as the transmission of digital audio and video signals from direct broadcast satellites.
Such advances were made possible, in part, by international standards which provide compatibility between different approaches to compression and decompression. One such standard is known as xe2x80x9cJPEG,xe2x80x9d for Joint Photographic Expert Group. A later developed standard is known as xe2x80x9cMPEG 1.xe2x80x9d This was the first set of standards agreed to by the Moving Pictures Expert Group. Yet another standard is known as xe2x80x9cITU-T H.261xe2x80x9d, which is a video compression standard particularly useful for video teleconferencing. Although each standard is designed for a specific application, all of the standards have much in common.
MPEG1 was designed for storing and distributing audio and motion video, with emphasis on video quality. Its features include random access, fast forward and reverse playback. MPEG1 serves as the basis for video compact disks and for many video games. The original channel bandwidth and image resolution for MPEG1 were established based upon the recording media then available. The goal of MPEG1 was the reproduction of recorded digital audio and video using a 12 centimeter diameter optical disc with a bit rate of 1.416 Mbps, 1.15 Mbps of which are allocated to video.
The compressed bit streams generated under the MPEG1 standard implicitly define the decompression algorithms to be used for such bit streams. The compression algorithms, however, can vary within the specifications of the MPEG 1 standard, thereby allowing the possibility of a proprietary advantage in regard to the generation of compressed bit streams.
A later developed standard known as xe2x80x9cMPEG2xe2x80x9d extends the basic concepts of MPEG1 to cover a wider range of applications. Although the primary application of the MPEG2standards is the all digital transmission of broadcast-quality video at bit rates of 4 Mbps to 9 Mbps, it appears that the MPEG2standard may also be useful for other applications, such as the storage of full length motion pictures on Digital Video Disk (xe2x80x9cDVDxe2x80x9d) optical discs, with resolution at least as good as that presently provided by 12 inch diameter laser discs.
The MPEG2standard relies upon three types of coded pictures. I (xe2x80x9cintraxe2x80x9d) pictures are fields or frames coded as a stand-alone still image. Such I pictures allow random access points within a video stream. As such, I pictures should occur about two times per second. I pictures should also be used where scene cuts (such as in a motion picture) occur.
P (xe2x80x9cpredictedxe2x80x9d) pictures are fields or frames coded relative to the nearest previous I or P picture, resulting in forward prediction processing. P pictures allow more compression than I pictures through the use of motion compensation, and also serve as a reference for B pictures and future P pictures.
B (xe2x80x9cbidirectionalxe2x80x9d) pictures are fields or frames that use the most closest (with respect to display order) past and future I or P picture as a reference, resulting in bidirectional prediction. B pictures provide the most compression and increase signal to noise ratio by averaging two pictures.
Such I, P and B pictures are more thoroughly described in U.S. Pat. Nos. 5,386,234 and 5,481,553 assigned to Sony Corporation and said U.S. Patents are incorporated herein by reference.
A group of pictures (xe2x80x9cGOPxe2x80x9d) is a series of one or more coded pictures which assist in random accessing and editing. A GOP value is configurable during the encoding process. Since the I pictures are closer together, the smaller the GOP value, the better the response to movement. The level of compression is, however, lower.
In a coded bitstream, a GOP must start with an I picture and may be followed by any number of I, P or B pictures in any order. In display order, a GOP must start with an I or B picture and end with an I or P picture. Thus, the smallest GOP size is a single I picture, with the largest size being unlimited.
In further detail, FIG. 1 illustrates a simplified block diagram of an MPEG2encoder 100. A video stream consisting of macroblock information and motion compensation information is provided to both a discrete cosine transformer 102 and a motion vector generator 104. Each 8xc3x978 block (of pixels or error terms) is processed by the discrete cosine transformer 102 to generate an 8xc3x978 block of horizontal and vertical frequency coefficients. The quantizer 106 quantizes the 8xc3x978 block of frequency-domain error coefficients, thereby limiting the number of allowed values.
Higher frequencies are usually quantized more coarsely than low frequencies, taking advantage of the human perception of quantization error. This results in many frequency-domain error coefficients being zero, especially at higher frequencies.
The output of quantizer 106 is processed by a zigzag scanner 108, which, starting with DC components, generates a linear stream of quantized frequency coefficients arranged in order of increasing frequency. This produces long runs of consecutive zero coefficients, which are sent to the variable length encoder 110.
The linear stream of quantized frequency-domain error coefficients is first run-length encoded by the variable length encoder 110. In the run-length encoding process, the linear stream of quantized frequency-domain error coefficients is converted into a series of run-amplitude (or run-level) pairs. Each pair indicates the number of zero coefficients and the amplitude of the non-zero coefficient which ends the run.
For example, assume a string of error coefficients as follows:
(1) Original error coefficients: 000060000038
Therefore, when this string of error coefficients is variable length encoded, according to the encoding rules described above, the following encoded run-level pairs are obtained:
(2) Encoded run-level pairs: (4,6) (5,3) (0,8)
Of course, as the number of zero coefficients is increased, the error coefficient data will be more effectively compressed by this variable length encoding.
After the variable length encoder 110 encodes the run-level pairs, it then Huffman encodes the run-level pairs. In the Huffman encoding, the run-level pairs are coded differently depending upon whether the run-level pair is included in a list of commonly-occurring run-level pairs. If the run-level pair being Huffman encoded is on the list of commonly-occurring pairs, then it will be encoded into a predetermined variable length code word which corresponds to the run-level pair. If, on the other hand, the run-level pair is not on the list, then the run-level pair is encoded as a predetermined symbol (such as an escape symbol) followed by a fixed length codes to avoid long code words and to reduce the cost of implementation.
The run-length encoded and Huffman encoded output of the variable length encoder 110 provides a coded video bitstream. Picture type determination circuit 112 determines whether the frame being encoded is a P picture, an I picture or a B picture. In the case of a P or I picture, picture type determination circuit 110 causes the motion vector generator 104 to generate an appropriate motion vector which is then provided to variable length encoder 110. Such motion vector is then coded and combined with the output of variable length encoder 110.
Referring now to FIGS. 2 and 3, the concept of motion compensation is explained. Motion compensation improves compression of P and B pictures by removing temporal redundancies between pictures. With MPEG 2, it operates at the macroblock level. For example, a previous frame 200 contains, among other macroblocks, a macroblock 202 consisting of 16 pixels (also referred to as xe2x80x9cpelsxe2x80x9d) by 16 lines. Motion compensation relies on the fact that, except for scene cuts, most images remain in the same location from frame to frame, whereas others move only a short distance. Thus, such motion can be described as a two-dimensional motion vector that specifies where to retrieve a macroblock from a previously decoded frame to thereby predict the pixel values of a current macroblock. Thus, a macroblock 300 of a current frame 302 can be represented by the macroblock 202 (of FIG. 2) as modified by a two dimensional motion vector 304. It is to be understood that the macroblock 300 may or may not be within the same boundaries surrounding macroblock 202 in the previous frame 200.
After a macroblock has been compressed using motion compensation, it contains both the prediction (commonly referred to as xe2x80x9cmotion vectorsxe2x80x9d) and temporal difference (commonly referred to as xe2x80x9cerror termsxe2x80x9d) between the reference macroblock and the macroblock being coded.
Returning to FIG. 1, when the coded video bitstream output from variable length encoder 110 is recorded onto a recording medium such as an optical disk, and such recorded information is reproduced for local use, although not completely error free, the decoded (coded) video bit stream is, generally, sufficiently error free so as to not require additional techniques to compensate for errors in the decoded video bit stream. Such a coded video bit stream is typically referred to as a xe2x80x9cprogram stream.xe2x80x9d When the coded video bitstream output from variable length encoder 110 is transported by, for example, satellite or cable transmission systems, either directly from variable length encoder 110 or from a recording medium onto which the coded video bitstream has been recorded, the probability of errors in the decoded video bitstream increases. Such a coded bitstream is typically referred to as a xe2x80x9ctransport stream.xe2x80x9d
Since traditional error detection and correction systems, such as interleaving, require a significant amount of overhead as well as a significant amount of data processing when decoding coded video bitstream signals, current video decoding systems rely upon error concealment as opposed to error correction. In contrast to error correction, which attempts to reconstruct lost or corrupt data, error concealment aims to generate data which can be substituted for the lost or corrupt data, where any discrepancies in image created by the generated data (generally at the macroblock level) are not likely to be perceived by a viewer of a video image which relies upon such error concealment.
Accordingly, it would be desirable to provide-a method and apparatus for concealing errors where the visual effect perceived by a viewer is negligible, and where the method and apparatus adapt to the different types of available information as may be available to provide such concealment.
It is an object of the invention to provide a method and apparatus for concealing errors during decoding of compressed video signals.
It is a further object of the invention to provide a method and apparatus for detecting errors which do not produce illegal syntax.
It is a feature of the invention to utilize a temporal prediction of a motion vector to generate a macroblock which will effectively conceal an error in a data stream.
It is a further feature of the invention to compare DC coefficients of a current macroblock to a predicted coefficient to determine whether an error which does not produce illegal syntax has occurred.
It is an advantage of the invention to improve the quality of concealment of an error in a data stream.
It is a further advantage of the invention to improve the quality of detection of an error in a data stream.
According to one aspect of the invention, an apparatus for concealing errors includes a detector for detecting the presence of an error in data representing the current macroblock, a system for estimating the at least one motion vector based upon a difference between a forward reference frame at the current macroblock and a decoded motion vector for the forward reference frame at the current macroblock, and a system for estimating the current macroblock based upon the estimated at least one motion vector. According to another aspect of the invention, a method for concealing errors includes the steps of detecting the presence of an error in data representing the current macroblock, estimating the at least one motion vector based upon a difference between a forward reference frame at the current macroblock and a decoded motion vector for the forward reference frame at the current macroblock, and estimating the current macroblock based upon the estimated at least one motion vector.