1. Technical Field
The present invention concerns the coding and decoding of video signals using inter-frame motion-compensated predictive coding, and more especially to techniques directed towards the concealment, in the decoded pictures, of degradation caused by transmission errors.
Details of the invention will be described in the context of video signals encoded and decoded in accordance with the so-called MPEG-1 system, as defined in the standard ISO-11172. However, the invention can also be used with other coding schemes in which some pictures are coded using bidirectional prediction.
2. Related Art
In MPEG, some frames of the video signals are coded independently of other frames—i.e., without using inter-frame predictive coding. These are called intra- or I-frames. Other frames are coded using inter-frame predictive coding in which one codes only the difference between the frame being coded and a prediction generated from one or more other frames of the video signal. These inter-frames are of two types, one of which is the predicted, or P-frame, where the prediction is formed from the preceding I- or P-frame. The I- and P-frames are sometimes referred to generically as anchor frames, because they can be used as reference frames for predictions, in contradistinction to the second type of predicatively coded frame, the bidirectional or B-frame, which is not so used. For the B-frame, the prediction is chosen, according to the picture content, to be from the preceding anchor frame, the following anchor frame, or a weighted sum of predictions from both, whichever gives the best results (in the case of a weighted sum, the weights are determined by the relative temporal proximity of the B-frame to the two anchor frames). Note that this decision is not taken for the frame as a whole; rather, the frame is divided into macroblocks and the decision is taken for that macroblock. If predictive coding is judged to be unsatisfactory for a particular macroblock, that macroblock may be coded without prediction (i.e., in the same manner as for an I-frame): this also applies to P-frames.
Further coding proceeds in that, for each macroblock, the picture element (pixel) values (in the case of an I-frame or intra-macroblock of a P- or B-frame) or the inter-frame pixel differences (in the case of differential coding) are transformed using the discrete cosine transform (DCT): for this purpose, each macroblock (16×16 pixels) is divided into four 8×8 blocks.
The division of the video signal into different types of frame is as follows. The sequence of frames to be coded is divided into Groups of Pictures, each of which is a series of one or more frames. Each group contains N frames (N≧1), and begins with an I-frame, followed by P-frames at regular intervals. Between these anchor frames are B-frames, so that the anchor frames are M frames apart (i.e., there are M−1 B-frames between each pair of consecutive anchor frames). Neither the P nor B-frames need to be present. Commonly, for a 625-line, 25-frame-per-second system, N=12 and M=3. In this description, two successive anchor frames and the B-frames which lie between them are referred to together as a “sub-group.”
FIG. 1 shows a series of thirteen frames from such a signal, in the order in which they are captured by a camera or displayed at a receiver. The frames are marked I, P or B. The order of prediction is indicated by arrows, the arrow-head pointing from the anchor frame used as reference for the prediction towards the frame which is to be coded using that prediction. Thus, for example, the prediction for frame B9 is to be performed by bidirectional prediction from frames P4 and P7. Because of the use of backward prediction, the frames cannot be coded in the order shown; for example, frame P7 must be coded before frame B9. To indicate this, the frames are numbered in the order in which they are coded.
When coding a macroblock in, for example, frame P7 using frame P4 as reference, then in principle one can take, as one's prediction for differential coding, the correspondingly positioned macroblock in frame P4. However, because of movement in the scene, this may not be optimum and, therefore, the MPEG standard uses motion-compensated predictive coding whereby one takes as one's prediction an area of the reference frame the same size and shape as the macroblock, but offset from it by an amount referred to as a motion vector. This vector is transmitted along with the difference information. In the case of a bidirectionally coded macroblock within a B-frame, of course two motion vectors are sent.
FIG. 2 is a simplified block diagram of a conventional MPEG encoder. Incoming frames, received in the order shown in FIG. 1, are first buffered in a reordering unit 1 and read out in the order indicated by the numbers in FIG. 1. In the case of an I-frame, or an intra-macroblock of a P- or B-frame, the pixel values are subjected to the discrete cosine transform at 2, quantization 3, variable-length coding 4 and fed to an output buffer 5. Because the data rate at this point varies according to picture content, a buffer control unit 6 monitors the buffer fullness and controls the quantizer 3 so that the buffer 5 can output to a fixed bit-rate line without overflow or underflow.
The output of the quantizer 3 is decoded by a local decoder consisting of an inverse quantizer 7 and an inverse DCT unit 8 and stored in a frame store 9.
In the case of a predicted macroblock within a P-picture, a motion estimation unit 10 evaluates the optimum motion vector for prediction and the relevant shifted region of the previous anchor frame stored in the frame store 9 is read out. This is subtracted from the incoming signal in a subtractor 11 and the difference is then coded just as described above. In this case, the local decoder also employs an adder 12 to add the subtracted signal back in to form a decoded frame which is again stored in the frame store 9.
Note that the frame store 9 actually stores two frames, so that when coding, for example, frames B14 and B15, frames I10 and P13 are both available for prediction purposes. In the case of coding of a B-frame, the motion estimation unit 10 evaluates the macroblock to be coded against the two frames stored in the frame store 9 to decide whether to use forward, backward or bidirectional prediction, and produce the necessary motion vector or vectors. The relevant prediction is generated from the contents of the frame store 9 and fed to the subtractor 11, following which further coding of the macroblock takes place as before. Note, however, that B-frames are not decoded for entry into the frame store 9 as they are not needed for prediction purposes.
In the context of the present invention, we are interested in the decoding of coded video signals following transmission (or, perhaps recording and replay), when errors may occur. These may be of brief duration, or may persist for some time: for example, in packet-switched networks, network congestion may cause delays exceeding the maximum delay that a decoder can accommodate, so that a whole packet is effectively lost. Even brief errors can cause considerable disruption if they cause loss of synchronization of information coded using variable-length codes. Inherently, the use of inter-frame coding means that corruption of an anchor frame propagates into subsequent frames.
It has already been proposed to conceal the missing parts of frames occasioned by such errors by copying from another frame. Indeed, the MPEG standard makes some provision for this by providing that an I-frame may contain motion vectors which are normally unused, but can, in the event that a macroblock is lost, be used to make a prediction from the preceding anchor frame, which can then be displayed instead. This vector is transmitted in the macroblock directly below the macroblock to which it is applied. However, in the event of the loss of a significant portion, or all, of a frame, this concealment fails, because the concealment vectors are also lost.