There exist numerous video and image coding standards. For example, the JPEG standard is utilized for still images. MPEG2 is directed to digital television while H.261 is used for ISDN video conferencing. H.263 is directed to video coding at low bit rates (typically 20-30 kbps and above). In order to reduce the bandwidth of video data sent in accordance with H.261, H.263, and MPEG-2 Video, a previously coded frame is subtracted from a current frame with only the difference between the two being coded. More specifically, motion estimation is first performed per macroblock, subtraction is performed between the motion compensated macroblock of the previous decoded frame and the current macroblock, and both th emotion vector and the difference macroblock are coded. Areas of the frame that do not change, or that change very little (such as the background) are not encoded but rather only indications which areas have not changed may be coded. Each frame is divided into a discrete plurality of macroblocks. These macroblocks are typically 16×16 pixels in size. As a result, only the macroblocks that experience a change from frame to frame need to be encoded, and indications for the remaining macroblocks that they are not coded may be generated. In this manner, a single frame is encoded in its entirety and a number of subsequent frames are encoded to provide only the changes from previous frames. Periodically, a new frame (an INTRA picture) may be coded in its entirety and the process of coding changes in subsequent frames is repeated. It is not mandatory to code any intra frames other than the first frame of a bitstream.
In video transmission via error prone channels, errors are likely to be found in the received frame, or picture, data. In H.264, a picture is either a frame (of progressive video content) or a field (of interlaced video content). The terms “picture” and “frame” are used interchangeably herein. When a picture is lost or corrupted so severely that the concealment result is not acceptable, the receiver typically pauses video playback and waits for the next INTRA picture to restart decoding and playback. If possible, the receiver also issues a request the transmitter for an INTRA picture update. In some applications, e.g., in multicast video streaming, the transmitter cannot react to INTRA update requests, but rather the transmitter encodes an INTRA picture relatively frequently, such as every few seconds, to enable new clients to join the multicast session and to enable recovery from transmission errors. Consequently, receivers may have to pause video playback for a relatively long time after a lost picture, and users typically find this behavior annoying.
There are numerous ways to decrease the probability of transmission errors that cause the decoder to pause playback. Multiple description coding produces two or more correlated bitstreams so that a high-quality reconstruction can be obtained from all the bitstreams together, while a lower, but still acceptable, quality reconstruction is guaranteed if only one bitstream is received. Video redundancy coding (VRC) is one example of multiple description coding, in which several independent bitstreams are generated by using independent prediction loops. For example, an even frame is predicted from the previous even frame, and an odd frame from the previous odd frame.
Industry standards, such as H.263 and H.264/AVC, specify a mechanism called Supplemental Enhancement Information (SEI), which enables one to include such data in the coded bitstream that is not mandatory for recovery of correct sample values in the decoding process but can be helpful in the rendering process, for example. The SEI mechanism enables one to convey SEI messages in the bitstream. H.263 and H.264/AVC standards contain syntax and semantics for a number of SEI messages. Additionally, it is possible to specify SEI messages in standards that use H.263 or H.264/AVC, and codec vendors can also use proprietary SEI messages. MPEG-2 Video and MPEG-4 Visual standards provide a similar mechanism to SEI known as user data. H.263 and H.264/AVC define the use of spare reference pictures via Supplemental Enhancement Information (SEI) messages for signaling pictures or areas within pictures that can be used to perform motion compensation if the actual reference picture is lost or corrupted. The SEI messages indicate whether a first picture resembles a second picture to an extent that the first picture can be used as an inter prediction reference picture replacing the second picture. Moreover, in H.264/AVC the SEI message allows for indicating that a certain area of a picture in one or more decoded reference pictures resembles the co-located area in a specified decoded picture to such an extent that it may be used to replace the co-located area in the target picture. In H.264/AVC, the spare picture SEI message can reside in various access units, whereas in H.263 the spare reference picture SEI message resides in the picture immediately following the target picture in decoding order. In general, a frame is decoded from an access unit. A field can also be decoded from an access unit. In the scalable extension of H.264/AVC, decoding of an access unit sometimes results in two frames. Each access unit is formed of a series of network abstraction layer units (NAL units) that include one or more coded slices making up a coded picture.
The H.264/AVC coding standard includes a technical feature called a redundant picture. A redundant picture is a redundant coded representation of a picture, called a “primary picture”, or a part of a picture. A spare coded picture (i.e. the SEI message) has a different syntax and semantics as the primary coded picture. When decoded, the terms “redundant picture” and “spare picture” can be used interchangeably. Each primary coded picture may have a few redundant pictures. After decoding, the region represented by a redundant picture should be similar in quality as the same region represented by the corresponding primary picture. A redundant picture must reside in the same access unit as the corresponding primary picture. In other words, they are next to each other in decoding order. If a region represented in the primary picture is lost or corrupted due to transmission errors, a correctly received redundant picture containing the same region can be used to reconstruct the region (if all the referenced areas in inter prediction of the redundant picture are correct or approximately correct in content).
The spare reference picture SEI messages in H.263 and H.264/AVC address efficiently only those video sequences that are shot with a stationary camera without zooming or changing the camera direction. However, in picture sequences Pn, Pn+1, Pn+2, etc. that include camera pan, tilt, zoom-in or zoom-out, a large portion (e.g. non-moving objects and background) of pictures Pn and Pn+1 resemble each other. If picture Pn+1 is lost or corrupted, it would help in error concealment and error tracking if the resembling areas could be identified and used in error concealment and as prediction references. As used herein, “error tracking” refers to tracking the impact (spatial coverage and sometimes also estimate of magnitude of error) of erroneously decoded areas in a picture in the next pictures that use the picture as a reference for inter prediction.
Unfortunately, error robustness of redundant pictures suffers from the fact that burst errors easily affect both a primary picture and the corresponding redundant pictures.