1. Field of the Invention
The present invention relates to a method and device for decoding a sequence of digital images with error concealment.
The invention belongs to the domain of video processing in general and more particularly to the domain of decoding with error concealment after the loss or corruption of part of the video data, for example by transmission through an unreliable channel.
2. Description of the Related Art
Compressed video sequences are very sensitive to channel disturbances when they are transmitted through an unreliable environment such as a wireless channel. For example, in an IP/Ethernet network using the UDP transport protocol, there is no guarantee that the totality of data packets sent by a server is received by a client. Packet loss can occur at any position in a bitstream received by a client, even if mechanisms such as retransmission of some packets or redundant data (such as error correcting codes) are applied.
In case of unrecoverable error, it is known, in video processing, to apply error concealment methods, in order to partially recover the lost or corrupted data from the compressed data available at the decoder.
Most video compression formats, for example H.263, H.264, MPEG1, MPEG2, MPEG4, SVC, use block-based discrete cosine transform (DCT) and motion compensation to remove spatial and temporal redundancies. They can be referred to as predictive video formats. Each frame or image of the video sequence is divided into slices which are encoded and can be decoded independently. A slice is typically a rectangular portion of the image, or more generally, a portion of an image. Further, each slice is divided into macroblocks (MBs), and each macroblock is further divided into blocks, typically blocks of 8×8 pixels. The encoded frames are of two types: predicted frames (either predicted from one reference frame called P-frames or predicted from two reference frames called B-frames) and non predicted frames (called INTRA frames or I-frames).
For a predicted P-frame, the following steps are applied at the encoder:                motion estimation applied to each block of the considered predicted frame with respect to a reference frame, resulting in a motion vector per block pointing to a reference block of the reference frame. The set of motion vectors obtained by motion estimation form a so-called motion field;        prediction of the considered frame from the reference frame, where for each block, the difference signal between the block and its reference block pointed to by the motion vector is calculated. The difference signal is called residual signal or residual data. A DCT is then applied to each block of residual data, and then, quantization is applied to the transformed residual data;        entropic encoding of the motion vectors and of the quantized transformed residual data.        
In the case of B-frames, two reference frames and two motion vectors are similarly used for prediction.
For an INTRA encoded frame, the image is divided into blocks of pixels, a DCT is applied on each block, followed by quantization and the quantized DCT coefficients are encoded using an entropic encoder.
In practical applications, the encoded bitstream is either stored or transmitted through a communication channel.
At the decoder side, for the classical MPEG-type formats, the decoding achieves image reconstruction by applying the inverse operations with respect to the encoding side. For all frames, entropic decoding and inverse quantization are applied.
For INTRA frames, the inverse quantization is followed by inverse block DCT, and the result is the reconstructed image signal.
For predicted type frames, both the residual data and the motion vectors need to be decoded first. The residual data and the motion vectors may be encoded in separate packets in the case of data partitioning. For the residual data, after inverse quantization, an inverse DCT is applied. Finally, for each predicted block in the P-frame, the signal resulting from the inverse DCT is added to the reconstructed signal of the block of the reference frame pointed out by the corresponding motion vector to obtain the final reconstructed image block.
A video bitstream encoded with such a predictive format is highly sensitive to transmission errors, since an error will not only result in an incorrectly decoded image but will also propagate to the following images if the affected image is used as a reference image.
Several methods are known in the related art to achieve resilience to transmission errors of video bit streams.
A classical method is to use Forward Error Correction (FEC) codes. An error correction code is computed on the compressed video bitstream and transmitted with the video bitstream. It is necessary to evaluate the maximum error rate in order to correctly evaluate the size of the error correction code. In practice, all errors are corrected until a maximum error rate is reached. As soon as the error rate is greater than the maximum error rate, the quality of correction becomes very bad. It would be interesting for the users to have a system with progressive quality degradation.
Another category of methods in the related art comprises the error concealment methods. The error concealment methods are applied at the decoder, in order to replace lost or corrupted areas with data obtained from correctly received data, based on spatial or temporal interpolations. The error concealment may provide progressive quality degradation when the error rate increases, but the efficiency of error concealment largely depends on the video content.
An approach proposed to improve the performance of error concealment is to systematically send some auxiliary data, based on the original video sequence, to help the error concealment. The auxiliary data can be generated using a Wyner-Ziv compression scheme. In a Wyner-Ziv compression scheme applied to video compression, auxiliary data is extracted from a video frame, and is correlated to the video frame. An information relative to the auxiliary data is sent to the decoder, so as to improve decoding. The information relative to the auxiliary data is compressed with respect to the auxiliary data and represents at least part of the auxiliary data, so that it can be used to correct an approximate auxiliary data extracted from the corresponding decoded frame at the decoder side.
The patent application US 20080267288 describes a system using Wyner-Ziv auxiliary data to correct missing parts of a video, as illustrated in FIG. 1. The video sequence encoding is carried out on the server device S. The video is classically encoded (module 1100) and transmitted to a client device C. Auxiliary data is extracted from the original video data (module 1110), and then encoded by module 1120 and transmitted to the client. For example, the auxiliary data is a quantized version of the video and the Wyner-Ziv auxiliary data transmitted is only an error correction code of the auxiliary data.
The client device C receives both the video bitstream data and the encoded auxiliary data. The module 1105 applies decoding and error concealment on the received bitstream. Then, an approximate version of the auxiliary data is extracted from the result of the error concealment decoding by the auxiliary data extraction module 1115. The error correction code of the auxiliary data received is used to correct the approximated version of the auxiliary data (module 1125). This corrected auxiliary data is then used by module 1135 to improve the image quality. The improved image may be stored to be used at a later stage as a reference image. Further, the improved image is sent as an output for further processing/display.
This related art does not take full advantage of the error concealment and of the auxiliary data to improve the quality of the final images.
There is still room to further improve the correction given a fixed quantity of auxiliary data or alternatively, at equal quality of the final image, to reduce the quantity of auxiliary data transmitted.