This invention relates to error concealment in a video signal and to a method and apparatus therefore.
One of the recent targets in mobile telecommunications has been to increase the speed of the data transmission in order to enable incorporation of multimedia services to mobile networks. One of the key components of multimedia is digital video. Transmission of video comprises a continuous traffic of data representing moving pictures. As is generally known, the amount of data needed to transfer pictures is high compared to many other types of media, and so far usage of video in low bit-rate terminals has been negligible. However, significant progress has been achieved in the area of low bit-rate video compression. Acceptable video quality can be obtained at bit-rates around 20 kilo bits per second. As a result of this progressive reduction in bit-rate, it is expected that video is shortly going to become a viable service to offer over channels such as mobile channels.
A video sequence consists of a series of still images or frames. Video compression methods are based on reducing the redundancy and perceptually irrelevant parts of video sequences. The redundancy in video sequences can be categorised into spatial, temporal and spectral redundancy. Spatial redundancy means the correlation between neighbouring pixels within a frame. Temporal redundancy means the correlation between areas of successive frames. Temporal redundancy arises from the likelihood of objects appearing in a previous image appearing in the current image too. Compression can be achieved by generating motion compensation data which describes the motion (i.e. displacement) between similar areas of the current and a previous image. The current image is thus predicted from the previous one. Spectral redundancy means the correlation between the different colour components of the same image.
However, sufficient compression cannot usually be achieved by just reducing the redundancy of the sequence. Thus, video encoders try to reduce the quality of those parts of the video sequence which are subjectively the least important. In addition, the redundancy of the encoded bitstream is reduced by means of efficient lossless coding of compression parameters and coefficients. The main technique is to use variable length codes.
Video compression methods typically differentiate images which can or cannot utilise temporal redundancy reduction. Compressed images which do not utilise temporal redundancy reduction methods are usually called INTRA or I-frames whereas temporally predicted images are called INTER or P-frames. In the INTER frame case, the predicted (motion-compensated) image is rarely precise enough and therefore a spatially compressed prediction error image is also associated with each INTER frame.
Compressed video is easily corrupted by transmission errors, mainly for two reasons. Firstly, due to utilisation of temporal predictive differential coding (INTER frames), an error is propagated both spatially and temporally. In practice, this means that once an error occurs, it is easily visible to the human eye for a relatively long time. Especially susceptible are transmissions at low bit-rates where there are only a few INTRA-coded frames (the transmission of INTRA-coded frames would stop the temporal error propagation). Secondly, the use of variable length codes increases the susceptibility to errors. When a bit error alters the codeword to another one of different length, the decoder will lose codeword synchronisation and also decode subsequent error-free codewords (comprising several bits) incorrectly until the next synchronisation code. (A synchronisation code is a bit pattern which cannot be generated from any legal combination of other codewords.) Every bit in a compressed video bitstream does not have an equal importance to the decompressed images. Some bits belong to segments defining vital information such as picture type (e.g. INTRA or INTER), quantiser value and optional coding modes that have been used. In H.263, the most vital information is gathered in the picture header. A transmission error in the picture header typically causes a total misinterpretation of the subsequent bits defining the picture content. Due to utilisation of temporal predictive differential coding (INTER frames), the error is propagated both spatially and temporally. Thus, a normal approach to picture header corruption is to freeze the previous picture on the screen, to send an INTRA picture request to the transmitting terminal and to wait for the requested INTRA frame. This causes an annoying pause in the received video.
Transmission errors have a different nature depending on the underlying network. In packet-switched networks, transmission errors are typically packet losses (due to congestion in network elements). In circuit-switched networks, transmission errors are typically bit errors where a xe2x80x981xe2x80x99 is corrupted to xe2x80x980xe2x80x99 or vice versa.
To impede degradations in images introduced by transmission errors, retransmissions can be used, error detection and/or error correction methods can be applied, and/or effects from the received corrupted data can be concealed. Normally retransmission provides a reasonable way to protect video data streams from errors, but large round-trip delays associated with low bit-rate transmission and moderate or high error rates make it practically impossible to use retransmission, especially with real-time videophone applications. Error detection and correction methods usually require large overhead since they add some redundancy to the data. Consequently, for low bit-rate applications, error concealment can be considered as a preferred way to protect and recover images from transmission errors. Video error concealment methods are typically applicable to transmission errors occurring through packet loss and bit corruption.
H.263 is an ITU-T recommendation of video coding for low bit-rate communication which generally means data rates below 64 kbps. The recommendation specifies the bitstream syntax and the decoding of the bitstream. Currently, there are two versions of H.263. Version 1 consists of the core algorithm and four optional coding modes. H.263 version 2 is an extension of version 1 providing twelve new negotiable coding modes.
Pictures are coded as luminance (Y) and two colour difference (chrominance) components (CB and CR). The chrominance pictures are sampled at half the resolution of the luminance picture along both co-ordinate axes. Picture data is coded on a block-by-block basis, each block representing 8xc3x978 pixels of luminance or chrominance.
Each coded picture, as well as the corresponding coded bitstream, is arranged in a hierarchical structure with four layers, which are from bottom to top: block layer, macroblock layer, picture segment layer and picture layer. The picture segment layer can either be arranged as a group of blocks or a slice.
Block layer data consists of uniformly quantised discrete cosine transform coefficients, which are scanned in zigzag order, processed with a run-length encoder and coded with variable length codes.
Each macroblock relates to 16xc3x9716 pixels of luminance and the spatially corresponding 8xc3x978 pixels of chrominance components. In other words, a macroblock consists of four 8xc3x978 luminance blocks and the two spatially corresponding 8xc3x978 colour difference blocks. Each INTER macroblock is associated with a motion vector which defines the position of a corresponding area in the reference frame which resembles the pixels of the INTER macroblock. The INTER macroblock data comprises coded prediction error data for the pixels of the macroblock.
Usually, each picture is divided into groups of blocks (GOBs). A group of blocks (GOB) typically comprises 33 macroblocks (arranged as 3 rows of 11 macroblocks). Data for each GOB consists of an optional GOB header followed by data for the macroblocks within the GOB.
If the optional slice structured mode is used, each picture is divided into slices instead of GOBs. A slice contains a number of consecutive macroblocks in scan-order. Data for each slice consists of a slice header followed by data for the macroblocks of the slice.
The picture layer data contain parameters affecting the whole picture area and the decoding of the picture data. The coded parameter data is arranged in a so-called picture header.
Picture and GOB (or slice) headers begin with a synchronisation code. No other code word or a legal combination of code words can form the same bit pattern as the synchronisation codes. Thus, the synchronisation codes can be used for bitstream error detection and for resynchronisation after bit errors. The more synchronisation codes that are added to the bitstream, the more error-robust the system becomes.
The Video Redundancy Coding (VRC) method has been introduced in several papers (e.g. Stephan Wenger, xe2x80x9cSimulation Results for H.263+ Error Resilience Modes K, R, N on the Internetxe2x80x9d, ITU-T, SG16, Question 15, document Q15-D-17, Apr. 7, 1998). Its objective is to provide graceful video quality degradation against packet losses in packet-switched networks. The following paragraphs explain the basics of the method.
The principle of the VRC method is to divide the sequence of pictures into two or more signals (or threads) in such a way that all frames are assigned to one of the threads in an interleaved fashion to form subsets of frames. Each thread (or subset of frames) is coded independently. Obviously, the frame-rate within one signal is much lower than the overall frame rate: half in the case of two threads (signals), a third in the case of three threads and so on. This may result in a substantial coding penalty because of the generally larger changes and the longer motion vectors typically required to represent accurately the motion related changes, between two INTER-frames within a signal or thread. At regular intervals, all the signals converge into a so-called Sync frame. From this Sync frame, a new series of threads is started.
If one of the threads containing a subset of frames is damaged because of, say, a packet loss, the remaining threads stay intact and can be used to predict the next Sync frame. It is possible to continue the decoding of the damaged signal, which leads to slight picture degradation, or to stop its decoding which leads to a drop of the frame rate. If the size of the subsets is kept reasonably small, however, degradation will persist only for a very short time, until the next Sync frame is reached.
The decoder selects a Sync frame from one of the undamaged threads to decode the Sync frame. This means that the number of transmitted I-pictures can be kept small, because there is no need for complete re-synchronisation.
If all threads are damaged between two Sync frames, it is not possible to accurately predict a Sync frame. In this situation, annoying artifacts will be present until the next I-picture is decoded correctly, as would be the case if VRC were not employed.
Currently, Video Redundancy Coding can be used with ITU-T H.263 video coding standard (version 2) if the optional Reference Picture Selection mode (Annex N) is enabled. However, there are no major obstacles to incorporating Video Redundancy Coding into other video compression methods too.
Most known error concealment techniques are based on spatial and temporal interpolation schemes. Spatial interpolation is used in INTRA frames and INTRA-coded areas of INTER frames. Spatial interpolation means that lost areas are interpolated from spatially neighbouring areas. This can be done for example using the distance weighted average of the boundary pixels.
Error concealment using temporal interpolation is more often used in low bit-rate video coding, since the number of INTRA frames is usually rather low. A very basic temporal interpolation scheme copies the lost areas from the same positions of the previous frame, i.e., it treats the lost blocks as xe2x80x9cnot codedxe2x80x9d blocks. In more advanced schemes, motion compensation is performed using either the median or average of the motion vectors of spatially neighbouring blocks. There have also been some proposals to use boundary pixel matching to find best motion vectors for the lost block.
In low resolutions and at low bit-rates, the correlation between spatially neighbouring blocks is often rather low. Thus interpolated motion vectors based on spatially neighbouring pixel data may be far from the original values. This means that one-directional concealment schemes often fail to reconstruct the original blocks. Furthermore, if only motion vectors are used for concealment without even trying to recover the prediction error blocks, the picture becomes blurred, since a great amount of detail will be lost. In practice, using current concealment schemes, errors or incorrectly concealed blocks are visible for a relatively long time.
Previously proposed utilisation of VRC in error-prone environments suffers from a few problems. First, if the interval between Sync frames is short (often the thread length has been proposed to be 5 frames), compression efficiency is compromised. On the other hand, if the threads are longer, error concealment tends not to be effective and picture quality is compromised.
In accordance with a first aspect of the invention there is provided a method of concealing an error in a frame of a video sequence, said video sequence comprising a plurality of frames and being encoded as at least two independently-coded signals, each of which represents a sub-set of frames of the video sequence, the method comprising receiving data representing a frame of the video sequence, identifying an error in the frame and concealing the error by predicting corresponding data using at least one frame which is encoded in a signal other than that in which the error is identified.
Thus the invention relates to a multi-threaded video coding scheme in which an erroneous area is concealed by interpolating temporally and preferably bidirectionally from uncorrupted frames of another VRC thread.
The invention provides means to achieve better image error concealment than prior-art solutions. Compared to prior-art Video Redundancy Coding methods, it makes it possible to use longer threads between Sync frames, thus increasing the compression efficiency.
Preferably the corresponding data is predicted bidirectionally i.e. using frames which occur in the video sequence previous and subsequent to the frame in which the error is identified, said previous and subsequent frames being encoded in at least one signal other than that in which the error is identified.
Most advantageously, the corresponding data may be predicted using frames which occur in the video sequence immediately previous and/or subsequent to the frame in which the error is identified, said previous and subsequent frames being present in at least one other signal.
The error may be initially concealed by predicting the corresponding data from a frame occurring previous to the said frame in the same signal as the frame in which the error is identified.
When the encoded signals include header information, a frame having an error in the header may be reconstructed by identifying an error in the header of a frame, determining whether the frame is an interframe-coded frame and, if so, predicting corresponding data using at least one frame which is encoded in a signal other than that in which the error is identified.
According to a second aspect of the invention a method of video decompression comprises receiving at least two signals representing a video sequence, said video sequence comprising a plurality of frames, each signal representing a sub-set of frames of the video sequence, identifying an error in a frame of the video sequence, concealing the error in the frame by predicting corresponding data using at least one frame which is encoded in a signal other than that in which the error is identified, and displaying the frames of the video sequence.
A third aspect of the invention relates to video error concealment apparatus for concealing an error in a frame of a video sequence, said video sequence comprising a plurality of frames and being encoded as at least two independently-coded signals, each of which represents a sub-set of frames of the video sequence, the apparatus comprising: an input for receiving said at least two signals, identifying an error in a frame of the video sequence, and means for concealing the error by predicting corresponding data using at least one frame which is encoded in a signal other than that in which the error is identified.
Preferably the concealing means is arranged to predict the corresponding data using frames which occur in the video sequence previous and subsequent to the frame in which the error is identified, said previous and subsequent frames being encoded in at least one signal other than that in which the error is identified.
Most advantageously the concealing means is arranged to predict the corresponding data using frames which occur in the video sequence immediately previous and/or subsequent to the frame in which the error is identified, said previous and subsequent frames being present in at least one other signal.
The concealing means may be arranged to initially conceal the error by predicting the corresponding data from a frame occurring previous to the said frame in the same signal as the frame in which the error is identified.
When the encoded signal includes header information, the apparatus may further comprise means for identifying an error in the header of a frame and means for determining whether the frame is an interframe-coded frame, wherein the concealing means is arranged to predict corresponding data using at least one frame which is encoded in a signal other than that in which the error is identified.