The present invention relates to digital video signal processing, and more particularly to devices and methods for video decoding error concealment.
There are multiple applications for digital video communication and storage, and multiple international standards for video coding have been and are continuing to be developed. Low bit rate communications, such as, video telephony and conferencing, led to the H.261 standard with bit rates as multiples of 64 kbps, and the MPEG-1 standard provides picture quality comparable to that of VHS videotape. Subsequently, H.263, MPEG-2, and MPEG-4 standards have been promulgated.
H.264/AVC is a recent video coding standard that makes use of several advanced video coding tools to provide better compression performance than existing video coding standards. At the core of all of these standards is the hybrid video coding technique of block motion compensation (prediction) plus transform coding of prediction error. Block motion compensation is used to remove temporal redundancy between successive pictures (frames or fields) by prediction from prior pictures, whereas transform coding is used to remove spatial redundancy within each block of both temporal and spatial prediction errors. FIGS. 2A-2B illustrate H.264/AVC functions which include a deblocking filter within the motion compensation loop to limit artifacts created at block edges.
Traditional block motion compensation schemes basically assume that between successive pictures an object in a scene undergoes a displacement in the x- and y-directions and these displacements define the components of a motion vector. Thus an object in one picture can be predicted from the object in a prior picture by using the object's motion vector. Block motion compensation simply partitions a picture into blocks and treats each block as an object and then finds its motion vector which locates the most-similar block in a prior picture (motion estimation). This simple assumption works out in a satisfactory fashion in most cases in practice, and thus block motion compensation has become the most widely used technique for temporal redundancy removal in video coding standards. Further, periodically pictures coded without motion compensation are inserted to avoid error propagation; blocks encoded without motion compensation are called intra-coded, and blocks encoded with motion compensation are called inter-coded.
Block motion compensation methods typically decompose a picture into macroblocks where each macroblock contains four 8×8 luminance (Y) blocks plus two 8×8 chrominance (Cb and Cr or U and V) blocks, although other block sizes, such as 4×4, are also used in H.264/AVC. The residual (prediction error) block can then be encoded (i.e., block transformation, transform coefficient quantization, entropy encoding). The transform of a block converts the pixel values of a block from the spatial domain into a frequency domain for quantization; this takes advantage of decorrelation and energy compaction of transforms such as the two-dimensional discrete cosine transform (DCT) or an integer transform approximating a DCT. For example, in MPEG and H.263, 8×8 blocks of DCT-coefficients are quantized, scanned into a one-dimensional sequence, and coded by using variable length coding (VLC). H.264/AVC uses an integer approximation to a 4×4 DCT for each of sixteen 4×4 Y blocks and eight 4×4 chrominance blocks per macroblock. Thus an inter-coded block is encoded as motion vector(s) plus quantized transformed residual block.
Similarly, intra-coded pictures may still have spatial prediction for blocks by extrapolation from already encoded portions of the picture. Typically, pictures are encoded in raster scan order of blocks, so pixels of blocks above and to the left of a current block can be used for prediction. Again, transformation of the prediction errors for a block can remove spatial correlations and enhance coding efficiency.
When a video bitstream is transmitted over a channel, parts of the data may be corrupted or lost. When the video is decoded, it is necessary to use a concealment method to replace the macroblocks that were lost or corrupted. A very simple concealment method may copy macroblocks from the previous frame, or substitute blank macroblocks for the first frame.
Spatial concealment uses pixels from the current frame to fill in the missing data, while temporal concealment uses the previous frame to predict the current frame. Spatial concealment should be used for scene changes, but this requires scene change detection. Any poorly concealed macroblocks will propagate over multiple frames, because video compression is achieved by predicting the current frame from past frames.
A simple concealment algorithm applies spatial concealment to INTRA-coded frames or INTRA-coded macroblocks, and applies temporal concealment to INTER-coded frames and macroblocks. However, this does not produce acceptable results, unless INTRA mode is used only for scene changes or new objects in the scene, and if INTER mode is never used in those cases. Encoders may not have sophisticated mode decision or scene detection logic. Periodic INTRA-coded frames may be inserted for random access points. Also, standards such as H.263 and MPEG-4 have a mandatory INTRA refresh rate for macroblocks, to limit potential drift caused by different IDCT implementations. In addition, encoders may use additional adaptive INTRA refresh to aid recovery, in case the bitstream is corrupted. The coding mode alone does not provide enough information to determine the most appropriate concealment method.
For videophone or camcorder applications, scene changes may be uncommon; therefore, earlier concealment methods only applied spatial concealment to the first frame, and temporal concealment to all remaining frames. However, this is not adequate for wireless streaming, which may contain frequent scene changes.
Also, because spatial concealment was so rarely applied, the method was quite simple. If the macroblocks above and below were available, a weighted average was used to fill in the missing pixels. Otherwise, a gray block was substituted, partly motivated by the fact that early cameras for handsets produced mostly dull colors.
Temporal concealment was also fairly simple, relying on the usual MPEG-4 motion-vector prediction to estimate the displacement from the previous frame. However, if much of the frame is missing, the predictors are unavailable, leading to a zero motion vector. This may be sufficient for talking head content, but can lead to frame break-up if there is much motion or panning.
FIG. 4 illustrates a prior concealment method. However, for content with scene changes and more motion, an improved concealment method is required.