This invention relates to the field of digital image and video coding, and more specifically to the area of error detection and data recovery while decoding an erroneous bitstream.
Transmission and storage of uncompressed digital video requires a large amount of bandwidth. Video compression is necessary to reduce the bandwidth to a level suitable for transmission over channels such as the Internet and wireless links. H.263, H.261, MPEG-1, MPEG-2, and MPEG-4 international video coding standards provide a syntax for compressing the original source video allowing it to be transmitted or stored using a fewer number of bits. These video coding methods serve to reduce redundancies within a video sequence at the risk of introducing loss in quality. However, the resulting compressed bitstream is much more sensitive to bit errors. When transmitting the compressed video bitstream in an error prone environment, the decoder must be resilient in its ability to handle and mitigate the effects of these bit errors. This requires the need for a robust decoder capable of resolving errors and handling them adeptly.
FIG. 1 is a simplified block diagram of an exemplary block-based video coder 100. The input 102 is typically a sequence of values representing the luminescence (Y) and color difference (Cr and Cb) components of each pixel in each image. The sequence of pixels may be ordered according to raster (line by line) scan of the image. At block 104 the sequence of pixels is reordered so that the image is represented as a number of macroblocks of pixels. In a 4:2:0 coding system, for example, each macroblock is 16 pixels by 16 pixels. In video, the images often change very little from one images to the next, so many coding schemes use inter-coding, in which a motion compensated version 127 of the previous image is subtracted from the current image at 106, and only the difference image 107 is coded. The luminescence (Y) macroblock is divided into four 8xc3x978 sub-blocks, and a Discrete Cosine Transform (DCT) is applied to each subblock at 108. The color difference signals (Cb and Cr) are sub-sampled both vertically and horizontally and the DCT of the resulting blocks of 8xc3x978 pixels is applied at 108. The DCT coefficients are quantized at quantizer 110 to reduce the number of bits in the coded image. Variable length coder 112 is then applied to convert the sequence of coefficients to serial bit-stream and further reduce the number of bit in the coded image 114.
In order to regenerate the image as seen by a decoder, an inverse variable-length coder 116, an inverse quantizer 118 and an inverse DCT 120 are applied. This gives the reconstructed difference image 121. The motion compensated version 127 of the previous image is then added at 122 to produce the reconstructed image. The reconstructed image is stored in frame store 128. The previous reconstructed image 129 and the current image 102 are used by motion estimator 124 to determine how the current image should by aligned with the previous reconstructed images to minimize the difference between them. Parameters describing this alignment are passed to variablelength coder 130 and the resulting information 132 is packaged with the DCT coefficients 114 and other information to form the final coded image. Motion compensator 126 is used to align the current image and produces motion compensated image 127.
In this inter-coding approach, each image depends upon the previous image, so an error in a single macroblock will affect macroblocks in subsequent frames. In order to mitigate this problem, macroblocks may be intra-coded periodically, i.e. coded without reference to any other macroblock.
FIG. 11 shows the decoding process 1100 of an exemplary decoder. Decoding of a frame begins by first decoding the picture header using a picture header decoder as in block 1110. This is followed by decoding of the GOB or slice header information (block 1115). The macroblock data 132xe2x80x2 and 114xe2x80x2 is then decoded in the macroblock data decoder shown in block 1120 and described in FIG. 2 . If a new slice is signaled after the decoding of a macroblock has finished (decision block 1125), the operation of the decoder returns to the GOB or slice header decoder (block 1115). If a new frame is found after the macroblock (decision block 1130), the decoder returns to the picture header decoder (block 1110), and begins to decode the next frame. If a new slice or frame is not signaled after the current macroblock, another macroblock is assumed (block 1130) and is decoded in block 1120, as described in FIG. 2.
An exemplary decoder 200, suitable for use with the encoder 100 of FIG. 1, is shown in FIG. 2. The input bit-stream 524xe2x80x2 (the apostrophe is used to indicate the signal may contain bit errors) may be modified from the bitstream produced by the coder by transmission or storage errors that alter the signal. Demultiplexer 201 separates the coefficient data 114xe2x80x2 and the motion vector data 132xe2x80x2 from other information. The input 114xe2x80x2 may be modified from the output 114 from the coder by transmission or storage errors that alter the signal. The image is reconstructed by passing the data through an inverse variable-length coder 202, an inverse quantizer 204 and an inverse DCT 206. This gives the reconstructed difference image 208. The inverse variablelength coder 202 is coupled with a syntax error detector 228 for identifying errors in the data. The coded motion vector 132xe2x80x2 may be modified from the output 132 from the coder by transmission or storage errors that alter the signal. The coded motion vector is decoded in inverse variable-length coder 222 to give the motion vector 224. Coupled with the inverse variable-length coder 222 is a syntax error detector 230 to detect errors in the coded motion vector data 132xe2x80x2. The previous motion compensated image 212 is generated by motion compensator 226 using the previous reconstructed image 220 and the motion vector 224. The motion compensated version 212 of the previous image is then added at 210 to produce the reconstructed image. Error correction may be applied at 215 if errors are identified by either the syntax error detectors, 228, or 230 or by other information contained in the bitstream. The correction may use techniques that utilize the motion vector information 224 in conjunction with the previously reconstructed frame 220. The reconstructed image 214 is stored in frame store 218. The sequence of pixels representing the reconstructed image 214 may then be converted at 216 to raster scan order to produce a signal 217 that may be presented to a visual display unit for viewing.
In their basic mode of operation H.261, H.263, MPEG-1, MPEG-2, and MPEG-4 code each frame in a hierarchical manner as shown in FIG. 3. The information transmitted for each frame includes the picture start code (PSC), picture header, group of block (GOB) or slice headers, macroblock information, and texture information for each coded block. The general structure of this transmitted bitstream is shown in FIG. 4. The PSC, picture header, and slice headers may be considered overhead information that enable the decoder to recover and decode the received bitstream with a high probability of error. It should also be noted that while the structure shown in FIG. 3 and FIG. 4 imply that header information follows each slice, it is also possible for the header information be split so that some or all of the header information occurs once every several slices.
One of the most important fields in the bitstream is the picture start code. The PSC is a special sequence of bits that is transmitted before each frame and serves two key purposes. First, the PSC delineates the start of a new frame, and second, it serves as resynchronization markers within the bitstream in the event of bit errors. Bit errors corrupting the picture start code can lead to the loss of the entire frame. This is because the decoder may not realize that a new frame has begun and will overrun the PSC by continuing to decode the picture header and subsequent data as if it belonged in the current frame. This can result in the loss of the entire frame and lead to substantial degradations in the video that will propagate due to the predictive nature of video coding.
It is therefore important for a decoder to be able to recognize that a picture start code has been corrupted. By being able to suspect a PSC overrun, the decoder may be able to recover valid data within the frame, reducing the amount of information lost to a GOB or a slice as opposed to losing the entire frame.
A method for detecting whether or not the PSC of a frame may have been overrun is to check the macroblock address of the next slice. If the macroblock address of the next slice or GOB is equal to or less than the address of the current slice or GOB, it can be assumed that the slice resides in the next frame and the PSC has been overrun. However, a shortcoming of this approach is evident when the macroblock address of the next slice itself may be in error. Bit errors in the macroblock address field can account for the address being corrupted and lead to the decoder decoding the next slice under the assumption that it lies in the next frame.
The picture header, which often follows the picture start code, contains vital information about the frame such as the timestamp, type, size, coding modes, quantization value, and miscellaneous administrative information required to correctly decode a frame. A bit error in any of the fields of the picture header can degrade the quality of the frame greatly. Errors in the timestamp information can cause the decoder to display images either in an incorrect order or not at all, possibly causing loss in synchronization with the associated audio. More severe errors may arise if coding modes or options are erroneously changed. These modes and options require the decoder to use special techniques, coding tables, or other configurations that will likely cause errors throughout the entire frame if not decoded correctly. These types of errors typically manifest themselves very early in the frame and lead to the entire frame being decoded in error even if data beyond the header is received error free.
The importance of the picture header is stressed in H.263 by providing a facility for determining changes in the picture header from frame to frame. This is accomplished by the use of the GOB frame ID (GFID) field within GOB and slice headers. The GFID is a 2 bit field whose value is that of the GFID of the previous frame if certain and important fields in the picture header have not changed. The GFID value remains constant in each GOB or slice header within a frame. A GFID that is different from the previous frame""s GFID value indicates that information in the header has changed. While indicating that header information has changed, a different GFID value does not specify which fields in the picture header were changed.
Video compression standards such as H.261, H.263, MPEG-1, MPEG-2, and MPEG-4 achieve efficient compression by reducing both temporal redundancies between video frames and spatial redundancies within a video frame. Each frame has associated timestamp information that identifies its temporal location with respect to some unit of time. As such, timestamps for sequential frames are represented as sequential integers.
As indicated in FIG. 1, a typical video encoder may be rate controlled to achieve a prescribed bit rate. This may result in selectively dropping frames to meet communication channel bandwidth capabilities. However, the timestamp for each coded frame must be transmitted in the coded bit stream. A decoder uses this timestamp to correctly display decoded frames at the same rate as the original source frame rate. Since it is inefficient to transmit the absolute timestamp for each encoded frame, the timestamp information is often processed using a mathematical modulo operation and the residual information is encoded. Equation (1) shows the mathematical relationship between the encoded timestamp, TR, and the absolute timestamp, currTR, with respect to the Modulo base, Base, for a frame.
TR=currTR Mod(Base).xe2x80x83xe2x80x83(1)
For example, in the case of H.263, Base=256. Therefore, each frame n has an encoded timestamp value range of [0, 255] stored in an 8-bit field. This mechanism for encoding the timestamp is known by the decoder, thereby allowing it to reconstruct the absolute frame timestamp value.
The general relationship used by a decoder to reconstruct the absolute timestamp value for frame n that has been encoded with the above described Modulo operation is given in equation (2). In equation (2), the division operation corresponds to integer division with truncation.                                                                                           c                  ⁢                                      xe2x80x83                                    ⁢                  u                  ⁢                                      xe2x80x83                                    ⁢                  r                  ⁢                                      xe2x80x83                                    ⁢                  r                  ⁢                                      xe2x80x83                                    ⁢                  T                  ⁢                                      xe2x80x83                                    ⁢                  R                                =                                  T                  ⁢                                      xe2x80x83                                    ⁢                  R                                            ,                                                                                            for                  ⁢                                      xe2x80x83                                    ⁢                  frame                                =                0                            ;                                                                                          currTR                =                                                                            lastTR                      Base                                        *                    Base                                    +                  TR                                            ,                                                                                            for                  ⁢                                      xe2x80x83                                    ⁢                  frame                                ≠                0                            ;                                                                          if              ⁢                              xe2x80x83                            ⁢                              (                                  frame                  ≠                                      0                    ⁢                                          xe2x80x83                                        ⁢                    and                    ⁢                                          xe2x80x83                                        ⁢                                          (                                              currTR                        ≤                        lastTR                                            )                                                                      )                                                                        currTR              =                              currTR                +                                  Base                  .                                                                                        (        2        )            
The absolute timestamp for current frame, currTR, is reconstructed by a decoder with knowledge of the Base value used by an encoder.
Timestamp information in MPEG-4 is encoded and reconstructed in a slightly different manner. The timestamps are coded as a fractional part of a second with the number of seconds elapsed since the last frame being signaled explicitly (modulo_time_base). The general relationship used by a decoder to decode an MPEG-4 encoded timestamp is given below.   currTR  =            TR      Base        +    seconds  
In error free environments, the method described above is straightforward and is used to reconstruct currTR for each frame at the decoder. The decoder in turn uses this timestamp information to appropriately display the frame.
In error prone environments, bit errors may corrupt the encoded timestamp field, TR, of the encoded frame. This will result in incorrectly reconstructing currTR. Using the reconstruction mechanism described by equation (2), a delay of up to (Base-1) time units could be added. Although this may have no effect in reconstructing the actual image data within a frame, it will mark the frame with an incorrect timestamp. For video streaming applications, this will result in the decoder introducing a delay in displaying the entire video sequence. This delay may be substantial, e.g., when the time unit is {fraction (1/30)} second and the Base is 256, the delay introduced could be up to 8.5 seconds. In the event that there is associated audio, this will result in having the video sequence and the audio stream mismatched. In addition to the annoying delay of video and the resulting mis-match with audio, the incoming bitstream will need to be buffered and may overflow the available buffer, resulting in lost data and degrading the video.
There is an unmet need in the art for a technique for detecting errors in the picture header and start code information and correcting or mitigating them. Specifically, there is a need for an adaptive timestamp reconstruction and concealment mechanism that is resilient to channel errors affecting the timestamp field of an encoded frame. There is a further need in the art for a method of suspecting PSC overrun that overcomes the weakness of using only the macroblock address, thereby permitting the decoder to suspect PSC overruns more confidently, allowing it to recover data in the frame, reducing the overall distortion, increasing the number of decoded frames, and recovering a majority of the frame that otherwise would have been completely lost.
There is also a need in the art for a technique for suspecting errors in the picture header and mitigating them. Detecting the erred frame header can result in a majority of the data within the frame being recovered that would otherwise be discarded if the header error was not detected. Header error detection techniques can also be used to ensure the validity of the current header information.