It has been a constant challenge for the research community and the industry to search for a better service quality for video streaming over the error-prone environment such as Internet, as the video bitstreams may be corrupted by random error or suffer packet loss in the channels.
To address the aforementioned problem, the MPEG-4 video coding standard is developed to provide users a new level of performance for various video communication services, such as video-on-demand (VOD) over the Internet or mobile multimedia applications. An MPEG-4 video system uses a robust encoded bitstream and a resilient decoding process. The robust encoded bitstream is used in the encoder to help, with some coding overhead, the recovery from error corruption. One of the methods for creating a robust bitstream is to insert additional intra blocks to stop error propagation in decoder. But the insertion of intra blocks will slightly decrease coding efficiency. Thus, the trade-off of the error propagation and coding efficiency must be built to achieve a good performance for MPEG-4 video encoders.
Cote, Shirani and Kossentini proposed an adaptive intra refreshment (IR) scheme for H.263 under the consideration of rate distortion optimization (IEEE Journal on Selected Areas in Communications, vol. 18, pp. 952-965, No. 6, 2002). The rate distortion optimization is to improve the timing of intra block insertion to achieve the optimized usage of IR based on the Internet conditions.
Another method is to use an error resilient decoding process, which can locate errors and then conceal the lost slices. The error location methods utilize useful header information available at the decoder for coding process resynchronization. For error resilience, MPEG-4 provides several tools, including the resynchronization marker (RM), the data partition (DP), and the reverse variable length coding (RVLC). The optimal usage of the error resilient tools is not specified in the video specification. To further enhance the error-resilient ability, the selection of the optimal parameters, intra refreshment, advanced error detection and concealment methods are required to improve the reconstructed video quality.
Several error concealment methods are developed for either spatial error concealment (SEC) or temporal error concealment (TEC). The SEC techniques exploit the spatial redundancy within a picture, while the TEC techniques exploit the temporal similarity of frames in a sequence. For spatial error concealment, various interpolation methods, such as multi-directional interpolation (Valente, et al., IEEE Transaction On Consumer Electronics, vol. 147, No. 3, 2001), and quadri-linear interpolation (Kwok, et. al., IEEE Transaction On Consumer Electronics, vol. 39, No. 3, 1993), are developed in addition to the widely used bi-linear interpolation (Kaiser, et. al., Signal Processing: Image Communication, vol. 14, No. 6-8, 1999). The multi-directional interpolation needs all neighboring macro blocks (MB) to correctly decide the edge direction in the lost MB and requires much more computational complexity. The quadri-linear interpolation is an area-based interpolation which takes the nearest four pixels to interpolate the recovered pixel. Two refinements are introduced by Kwok et. al. One is to increase the weight of nearer direction and the other is to take average of nearest pixels and their neighboring two pixels instead of nearest pixels only. The refinements will make the visual quality smoother.
For temporal error concealment, blind selection of motion vector such as mean, medium, nearest motion vector of surrounding motion vectors have been used. Boundary matching algorithm (BMA) is the most common method that uses the boundary properties to choose a best motion vector. There are two kinds of BMA. One is using boundary gradient to choose a result which makes the boundary match between lost MB and its neighbors. This method can be called a spatial BMA because it uses the spatial boundary correlation. The other BMA method is using boundary difference between the current frame and the previous frame. This method can be called a temporal BMA because it uses the temporal boundary correlation. Other temporal concealment method, such as decoder motion vector estimation (DMVE), uses search range and surrounding area to find a best motion vector according to temporal BMA or uses search range to refine the best motion vector of neighbors. It is obvious that the DMVE costs much more computational complexity due to testing more motion vectors and surrounding lines used for motion estimation.
As spatial concealment is suitable for the area in which spatial correlation is higher than temporal correlation, and temporal concealment is suitable for the area in which temporal correlation is higher than spatial correlation, several hybrid error concealment methods are developed to take advantages of their respective strength. A general hybrid scheme is that spatial concealment is used for intra-coded video object plane (I-VOP) and temporal concealment is used for predicted video object plane (P-VOP). Further refinement strategies are also developed to improve the performance of the hybrid concealment methods. For example, the majority of I-VOPs excluding the first video object plane (VOP) have temporal correlation; thus, the temporal methods are used to conceal the VOP. For pictures having conditions, such as scene change, fad in, or fad out, and less temporal correlation, the spatial methods are used to conceal the VOP. The approach proposed by Kraiser et. al. uses spatial activity and temporal activity to decide the use of spatial concealment or temporal concealment. Spatial activity is calculated by computing the variance of nearest neighboring macro-blocks. Temporal activity is calculated by computing the mean square error between co-located macro-blocks. When the temporal activity is larger than spatial activity, spatial concealment is used, and vice versa. Other approaches use the boundary smoothness property. The ratio of boundary gradient of lost macro-block to boundary gradient of above and below macro-blocks is used to decide if the boundary gradient of lost macro-block is too large and requires the use of spatial concealment instead of temporal method.
However, as more and more applications and activities are brought to the Internet, the competition for bandwidth and the fluctuation of the bandwidth availability is more severe than before. It is, therefore, necessary to device an MPEG-4 streaming system with adaptive error concealment capability in order to deliver performance to the video services.