The present invention concerns video data compression and in particular, apparatus and a method for processing groups of successive fields of video information to obtain high levels of data compression.
Image sequence data compression systems such as that proposed by the Moving Pictures Experts Group (MPEG), a committee within the International Standards Organization (ISO), have been very effective in coding image sequences for video signals under the NTSC standard as well as for High Definition Television (HDTV) sources. The MPEG system is described in a paper entitled "MPEG Video Simulation Model Three (SM3)" by the Simulation Model Editorial Group, available from ISO as ISO-IEC/JTC1/SC2/WG11/N0010 MPEG 90/041, 1990 which is hereby incorporated by reference for its teachings on the MPEG video signal encoding method.
The MPEG system integrates a number of well-known data compression techniques into a single system. These include motion-compensated predictive coding, discrete cosine transformation (DCT), adaptive quantization and variable length coding (VLC).
The motion-compensated predictive coding schemes used in HDTV systems process the video data in groups of frames in order to achieve relatively high levels of compression without allowing the performance of the system to be degraded by excessive error propagation. In these group of frame processing schemes, image frames are classified into one of three types: the intra-frame (I-frame) the predicted frame (P-frame) and the bidirectional frame (B-frame).
A two dimensional DCT is applied to small regions such as blocks of 8 by 8 pixels to encode each of the I-frames. The resulting data stream is quantized and encoded using a variable-length code, such as an amplitude run-length Huffman code, to produce the compressed output signal. P-frames and B-frames are processed as residues of corresponding I and P-frames, respectively, encoded using the two-dimensional DCT, quantized and variable-length coded. A typical sequence of frames may be represented by a sequence such as I, B, B, P, B, B, I, B, etc.
To operate effectively in the presence of inter-frame motion, several overlapping corresponding blocks from the I-frame or P-frame are compared with each block of a P or B-frame to find the one with the smallest residue. The residue block is then encoded using the two-dimensional DCT, adaptive quantization and variable-length coding. The reference block that was used to obtain the residue is then designated as the predecessor of the block to be encoded and is defined by a motion vector which is transmitted with the encoded residue block. This vector describes the displacement in the image plane which is needed to place the reference block in its target position in the new frame.
Processing frames in groups achieves a high level of data compression due to the strong temporal correlation among successive frames in conventional video images. Redundant information in the images is greatly reduced by the predictive coding method used for the P and B-frames. Under this method, small blocks of data from reconstructed I and P-frames are subtracted from corresponding blocks of data from the respective frames to be encoded as P and B-frames. The result of this operation is residue data values which describe the P-frames in terms of the I-frames and the B-frames in terms of the I and/or P-frames. For P and B-frames, only this residue data is encoded and transmitted.
This coding is undone at a receiver which reverses the steps to obtain reconstructed image data. Any errors in a frame that is used to predict other frames may propagate to the predicted frames. In addition, the dependence of one frame on its predecessor I or P-frame limits the ability of the receiver to display a frame selected at random and to accommodate standard television functions such as intra-group scene changes and channel switching. These limitations could be removed by encoding each frame as an I-frame, however, the resulting coded video signal would need considerably more bits per frame since it would not exploit the temporal redundancy that is inherent in most video information.
The MPEG encoding standard is designed for frame-oriented image sequences. Most video sources, however, provide a frame of information as two interlaced fields which are separated in time by one field interval. The standard has been adapted in two ways to accommodate field-oriented image sequences. The first method combines the successive even and odd fields of the interlaced source to form a sequence of frame images and then applies MPEG encoding to the sequence of frames. It is well known that, due to the temporal separation between successive fields, this method may produce unsatisfactory results. The second method avoids these problems by applying MPEG encoding to the sequence of fields in the same manner that it would be applied to a sequence of frames.
Motion predictive encoding is a problem with any of the MPEG encoding methods. As described above, the process of matching blocks in a predicted frame to displaced blocks in an anchor frame plays a key role in reducing the prediction residue and, thus, the bit rate for an MPEG encoded signal. The block matching method which is most commonly used assumes that blocks of pixels move by simple translation (i.e. vertically and/or horizontally) in the image plane from frame to frame or from field to field. This method does not perform well, for example, when the block is part of an object which is rotated about an axis in the image plane or which is subject to a change in size due to motion into or out of the frame or such as would result from an image zoom. In addition, this encoding method may not work well when there is a relatively large temporal separation between a frame to be predicted and its reference or anchor frame.