1. Field of the Invention
The invention concerns a method and device for video sequence decoding with error concealment.
The invention belongs to the domain of video processing in general and more particularly to the domain of decoding with error concealment after the loss or corruption of part of the video data, for example by transmission through an unreliable channel.
2. Description of the Prior-Art
Compressed video sequences are very sensitive to channel disturbances when they are transmitted through an unreliable environment such as a wireless channel. For example, in an IP/Ethernet network using the UDP transport protocol, there is no guarantee that the totality of data packets sent by a server is received by a client. Packet loss can occur at any position in a bitstream received by a client, even if mechanisms such as retransmission of some packets or redundant data (such as error correcting codes) are applied.
In case of unrecoverable error, it is known, in video processing, to apply error concealment methods, in order to partially recover the lost or corrupted data from the compressed data available at the decoder.
Most video compression methods, for example H.263, H.264, MPEG1, MPEG2, MPEG4, SVC, use block-based discrete cosine transform (DCT) and motion compensation to remove spatial and temporal redundancies. Each frame of the video sequence is divided into slices which are encoded and can be decoded independently. A slice is typically a rectangular portion of the image, or more generally, a portion of an image. Further, each slice is divided into macroblocks (MBs), and each macroblock is further divided into blocks, typically blocks of 8×8 pixels. The encoded frames are of two types: predicted frames (either predicted from one reference frame called P-frames or predicted from two reference frames called B-frames) and non predicted frames (called INTRA frames or I-frames).
For a predicted frame, the following steps are applied at the encoder:                motion estimation applied to each block of the considered predicted frame with respect to a reference frame, resulting in a motion vector per block pointing to a reference block of the reference frame. The set of motion vectors obtained by motion estimation form a so-called motion field.        prediction of the considered frame from the reference frame, where for each block, the difference signal between the block and its reference block pointed to by the motion vector is calculated. The difference signal, also known under the name of displaced frame difference, is called in the subsequent description residual signal or residual data. A DCT is then applied to each block of residual signal, and then, quantization is applied to the signal obtained after the DCT.        entropic encoding of the motion vectors and of the quantized transformed residual data signal.        
For an INTRA encoded frame, the image is divided into blocks of pixels, a DCT is applied on each block, followed by quantization and the quantized DCT coefficients are encoded using an entropic encoder.
In practical applications, the encoded bitstream is either stored or transmitted through a communication channel.
At the decoder side, for the MPEG-type formats, the decoding achieves image reconstruction by applying the inverse operations with respect to the encoder. For all frames, entropic decoding and inverse quantization are applied.
For INTRA frames, the inverse quantization is followed by inverse block DCT, and the result is the reconstructed signal.
For predicted type frames, both the residual data and the motion vectors need to be decoded first. The residual data and the motion vectors may be encoded in separate packets in the case of data partitioning. For the residual signal, after inverse quantization, an inverse DCT is applied. Finally, for each predicted block, the signal resulting from the inverse DCT is added to the reconstructed signal of the block of the reference frame pointed out by the corresponding motion vector to obtain the final reconstructed image signal.
In case of loss or corruption of data packets of the bitstream, for example when the bitstream is transmitted though an unreliable transmission channel, it is known to apply error concealment methods at the decoder, in order to use the data correctly received to reconstruct the lost data.
The error concealment methods known in the prior art can be separated into two categories:                temporal error concealment methods, and        spatial error concealment methods.        
Temporal error concealment methods reconstruct a field of motion vectors from the data available, and apply the reconstructed motion vector corresponding to a lost data block in a predicted frame to allow the prediction of the luminance of the lost data block from the luminance of the corresponding block in the reference frame. For example, if the motion vector for a current block in a current predicted image has been lost or corrupted, a motion vector can be computed from the motion vectors of the blocks located in the spatial neighborhood of the current block.
The temporal error concealment methods are efficient if there is sufficient correlation between the current decoded frame and the previous frame used as a reference frame for prediction. Therefore, temporal error concealment methods are preferably applied to entities of the predicted type (P frames or P slices), when there is no change of scene resulting in motion or luminance discontinuity between the considered predicted entities and the previous frame(s) which served as reference for the prediction.
Spatial error concealment methods use the data of the same frame to reconstruct the content of the lost data block(s).
In a prior-art rapid spatial error concealment method, the available data is decoded, and then the lost area is reconstructed by luminance interpolation from the decoded data in the spatial neighborhood of the lost area. Spatial error concealment is generally applied for image frames for which the motion or luminance correlation with the previous frame is low, for example in the case of scene change. The main drawback of classical rapid spatial interpolation is that the reconstructed areas are blurred, since the interpolation can be considered equivalent to a low-pass filtering of the image signal of the spatial neighborhood.
The article “Object removal by exemplar-based inpainting” by Criminisi et al, published in CVPR 2003 (IEEE Conference on Computer Vision and Pattern Recognition) describes a spatial error concealment method which better preserves the edges in an interpolated area by replicating available decoded data from the same frame to the lost or corrupted area, in function of a likelihood of resemblance criterion. The article describes an algorithm for removing large objects from a digital image, but it can also be applied as an error concealment method. The proposed algorithm replicates both texture and structure to fill-in the blank area, using propagation of already synthesized values of the same image, to fill the blank area progressively, the order of propagation being dependent on a confidence measure. The algorithm is complex and needs high computational capacities and a relatively long computational time. Moreover, the experiments show that in some cases, the reconstructed area is completely erroneous and shows false edges, which where not present in the initial image.