1. Field of the Invention
The present invention relates generally to video encoding and decoding and, in particular, to methods and apparatus for error concealment in video encoding and decoding.
2. Description of the Related Art
Advances in audio and video compression and decompression techniques, together with very large scale integration technology, have enabled the creation of new capabilities and markets. These include the storage of digital audio and video in computers and on small optical discs as well as the transmission of digital audio and video signals from direct broadcast satellites.
Such advances were made possible, in part, by international standards which provide compatibility between different approaches to compression and decompression. One such standard is known as "JPEG," for Joint Photographic Expert Group. A later developed standard is known as "MPEG 1." This was the first set of standards agreed to by the Moving Pictures Expert Group. Yet another standard is known as "ITU-T H.261", which is a video compression standard particularly useful for video teleconferencing. Although each standard is designed for a specific application, all of the standards have much in common
MPEG1 was designed for storing and distributing audio and motion video, with emphasis on video quality. Its features include random access, fast forward and reverse playback. MPEG1 serves as the basis for video compact disks and for many video games. The original channel bandwidth and image resolution for MPEG1 were established based upon the recording media then available. The goal of MPEG1 was the reproduction of recorded digital audio and video using a 12 centimeter diameter optical disc with a bit rate of 1.416 Mbps, 1.15 Mbps of which are allocated to video.
The compressed bit streams generated under the MPEG1 standard implicitly define the decompression algorithms to be used for such bit streams. The compression algorithms, however, can vary within the specifications of the MPEG 1 standard, thereby allowing the possibility of a proprietary advantage in regard to the generation of compressed bit streams.
A later developed standard known as "MPEG2" extends the basic concepts of MPEG1 to cover a wider range of applications. Although the primary application of the MPEG2 standards is the all digital transmission of broadcast-quality video at bit rates of 4 Mbps to 9 Mbps, it appears that the MPEG2 standard may also be useful for other applications, such as the storage of full length motion pictures on Digital Video Disk ("DVD") optical discs, with resolution at least as good as that presently provided by 12 inch diameter laser discs.
The MPEG2 standard relies upon three types of coded pictures. I ("intra") pictures are fields or frames coded as a stand-alone still image. Such I pictures allow random access points within a video stream. As such, I pictures should occur about two times per second. I pictures should also be used where scene cuts (such as in a motion picture) occur.
P ("predicted") pictures are fields or frames coded relative to the nearest previous I or P picture, resulting in forward prediction processing. P pictures allow more compression than I pictures through the use of motion compensation, and also serve as a reference for B pictures and future P pictures.
B ("bidirectional") pictures are fields or frames that use the most closest (with respect to display order) past and future I or P picture as a reference, resulting in bidirectional prediction. B pictures provide the most compression and increase signal to noise ratio by averaging two pictures. Such I, P and B pictures are more thoroughly described in U.S. Pat. Nos. 5,386,234 and 5,481,553 assigned to Sony Corporation and said U.S. Patents are incorporated herein by reference.
A group of pictures ("GOP") is a series of one or more coded pictures which assist in random accessing and editing. A GOP value is configurable during the encoding process. Since the I pictures are closer together, the smaller the GOP value, the better the response to movement. The level of compression is, however, lower.
In a coded bitstream, a GOP must start with an I picture and may be followed by any number of I, P or B pictures in any order. In display order, a GOP must start with an I or B picture and end with an I or P picture. Thus, the smallest GOP size is a single I picture, with the largest size being unlimited.
In further detail, FIG. 1 illustrates a simplified block diagram of an MPEG2 encoder 100. A video stream consisting of macroblock information and motion compensation information is provided to both a discrete cosine transformer 102 and a motion vector generator 104. Each 8.times.8 block (of pixels or error terms) is processed by the discrete cosine transformer 102 to generate an 8.times.8 block of horizontal and vertical frequency coefficients. The quantizer 106 quantizes the 8.times.8 block of frequency-domain error coefficients, thereby limiting the number of allowed values.
Higher frequencies are usually quantized more coarsely than low frequencies, taking advantage of the human perception of quantization error. This results in many frequency-domain error coefficients being zero, especially at higher frequencies.
The output of quantizer 106 is processed by a zigzag scanner 108, which, starting with DC components, generates a linear stream of quantized frequency coefficients arranged in order of increasing frequency. This produces long runs of consecutive zero coefficients, which are sent to the variable length encoder 110.
The linear stream of quantized frequency-domain error coefficients is first run-length encoded by the variable length encoder 110. In the run-length encoding process, the linear stream of quantized frequency-domain error coefficients is converted into a series of run-amplitude (or run-level) pairs. Each pair indicates the number of zero coefficients and the amplitude of the non-zero coefficient which ends the run.
For example, assume a string of error coefficients as follows:
(1) Original error coefficients: 000060000038
Therefore, when this string of error coefficients is variable length encoded, according to the encoding rules described above, the following encoded run-level pairs are obtained:
(2) Encoded run-level pairs: (4,6) (5,3) (0,8)
Of course, as the number of zero coefficients is increased, the error coefficient data will be more effectively compressed by this variable length encoding.
After the variable length encoder 110 encodes the run-level pairs, it then Huffman encodes the run-level pairs. In the Huffman encoding, the run-level pairs are coded differently depending upon whether the run-level pair is included in a list of commonly-occurring run-level pairs. If the run-level pair being Huffman encoded is on the list of commonly-occurring pairs, then it will be encoded into a predetermined variable length code word which corresponds to the run-level pair. If, on the other hand, the run-level pair is not on the list, then the run-level pair is encoded as a predetermined symbol (such as an escape symbol) followed by a fixed length codes to avoid long code words and to reduce the cost of implementation.
The run-length encoded and Huffman encoded output of the variable length encoder 110 provides a coded video bitstream. Picture type determination circuit 112 determines whether the frame being encoded is a P picture, an I picture or a B picture. In the case of a P or I picture, picture type determination circuit 110 causes the motion vector generator 104 to generate an appropriate motion vector which is then provided to variable length encoder 110. Such motion vector is then coded and combined with the output of variable length encoder 110.
Referring now to FIGS. 2 and 3, the concept of motion compensation is explained. Motion compensation improves compression of P and B pictures by removing temporal redundancies between pictures. With MPEG 2, it operates at the macroblock level. For example, a previous frame 200 contains, among other macroblocks, a macroblock 202 consisting of 16 pixels (also referred to as "pels") by 16 lines. Motion compensation relies on the fact that, except for scene cuts, most images remain in the same location from frame to frame, whereas others move only a short distance. Thus, such motion can be described as a two-dimensional motion vector that specifies where to retrieve a macroblock from a previously decoded frame to thereby predict the pixel values of a current macroblock. Thus, a macroblock 300 of a current frame 302 can be represented by the macroblock 202 (of FIG. 2) as modified by a two dimensional motion vector 304. It is to be understood that the macroblock 300 may or may not be within the same boundaries surrounding macroblock 202 in the previous frame 200.
After a macroblock has been compressed using motion compensation, it contains both the prediction (commonly referred to as "motion vectors") and temporal difference (commonly referred to as "error terms") between the reference macroblock and the macroblock being coded.
Returning to FIG. 1, when the coded video bitstream output from variable length encoder 110 is recorded onto a recording medium such as an optical disk, and such recorded information is reproduced for local use, although not completely error free, the decoded (coded) video bit stream is, generally, sufficiently error free so as to not require additional techniques to compensate for errors in the decoded video bit stream. Such a coded video bit stream is typically referred to as a "program stream." When the coded video bitstream output from variable length encoder 110 is transported by, for example, satellite or cable transmission systems, either directly from variable length encoder 110 or from a recording medium onto which the coded video bitstream has been recorded, the probability of errors in the decoded video bitstream increases. Such a coded bitstream is typically referred to as a "transport stream."
Since traditional error detection and correction systems, such as interleaving, require a significant amount of overhead as well as a significant amount of data processing when decoding coded video bitstream signals, current video decoding systems rely upon error concealment as opposed to error correction. In contrast to error correction, which attempts to reconstruct lost or corrupt data, error concealment aims to generate data which can be substituted for the lost or corrupt data, where any discrepancies in image created by the generated data (generally at the macroblock level) are not likely to be perceived by a viewer of a video image which relies upon such error concealment.
Accordingly, it would be desirable to provide a method and apparatus for concealing errors where the visual effect perceived by a viewer is negligible, and where the method and apparatus adapt to the different types of available information as may be available to provide such concealment.