1. Field of the Invention
This invention relates to compression coding and decoding of high quality interlaced scan video signals for digital storage and transmission media.
2. Description of the Prior Art
The bandwidth compression of video signal has recently attracted many active research and standardization efforts. Various techniques have been explored for the digital coding of various video signal formats at different bit rates.
In terms of the techniques of for obtaining the video sequence sources from recording devices and later refreshing on display devices, video sequences can be classified into two types: the progressive scan video sequence and the interlaced scan video sequence. In a progressive scan video sequence, a frame in the sequence is captured and refreshed sequentially line by line from the top to bottom of the frame. In an interlaced scan video sequence, a frame consists of two fields, the even field made up of the even lines of the frame and the odd field made up of the odd lines of the frame. Capturing and refreshing are performed first on the even field, sequentially from the top to bottom of the field, and followed by the odd field in the same manner. Since a large number of the present sequence sources are in the interlaced scan format (e.g. NTSC, PAL), an increasing number of research efforts have been directed towards the efficient coding of the interlaced scan video sequence.
Video bandwidth compression is done by reducing the temporal, spatial and statistical redundancy in a video sequence. To reduce the temporal redundancy, motion estimation and compensation is used. According to the prior art, in motion estimation and compensation (e.g. the MPEG Video Simulation Model Three, International Organization for Standardization, Coded Representation of Picture and Audio Information, 1990, ISO-IEC/JTC1/SC2/WG8 MPEG 90/041), the frames, which can also be referred to as pictures, in an interlaced scan video sequence can be classified into two types: (1) intra-coded frames in which each frame is coded using information only from itself; (2) predictive-coded frames in which each frame is coded using motion compensated prediction from a past intra-coded or predictive-coded frame or/and a future intra-coded or predictive-coded frame.
Each frame in the sequence is first partitioned into blocks of pixel data which are then processed by a block coding method such as discrete cosine transform (DCT) with or without motion compensation. For a predictive-coded frame, the blocks are coded with motion compensation using information from an adjacent frame by block matching motion estimation and compensation to predict the contents of the coded frame. The block matching algorithm used consists of determining the direction of translatory motion of the blocks of pixels from one frame to the next by finding the best matching block based on some pre-determined criteria. The differences between the best matching block and the actual block of pixels are then subjected to transformation (e.g. DCT) and quantization based on a quantization matrix and quantization steps given by a rate-controller, and run-length encoding of the quantized DCT coefficients. FIG. 1 is a block diagram describing the method proposed by MPEG. Detailed description of the method can be found in the document "MPEG Video Simulation Model Three (SM3)", ISO-IEC/JTC1/SC2/WG8, MPEG 90/041, 1990.
So that the decoder can use the best matching block for performing motion compensation, each block of pixels has an associated motion vector to indicate the location of the best matching block in the adjacent frame. The motion vector is a pair of numbers indicating the x and y offset of the block's location in the adjacent frame, with reference to the current block's location. For example, a motion vector of (3, 2) means that the best match for the current block can be found in the adjacent frames at the location +3 pixel to the right and +2 pixel below the current block's location. If all the motion vectors used for motion compensation are integer values, the motion vectors are called full-pixel resolution motion vectors, and the searching process for these full-pixel resolution motion vectors is called full-pixel search. In order to obtain a more accurate block from the adjacent frame for compensation, a sub-pixel resolution motion vector may be obtained. The sub-pixel resolution motion vector has non-integer values and points to a location in between the full-pixel location. The block to be used for compensation is then interpolated by using the pixels around it. One common example of the sub-pixel resolution motion vector is the half-pixel resolution motion vector. The half-pixel resolution motion vector may have values .+-.0.5, .+-.1.5, .+-.2.5, . . . The search for the half-pixel resolution motion vector usually follows after the full-pixel search such that the half-pixel is searched around the location indicated by the full-pixel resolution motion vector. The half-pixel resolution motion vector indicates that the best matching block of the adjacent frame comes from a half-pixel resolution block. This half-pixel resolution block can be obtained from the adjacent frame in two different ways, considering the interlace structure of the sequence, resulting in a frame-based interpolation mode and a field-based interpolation mode.
FIG. 2a illustrates the method of obtaining the half-pixel resolution motion vector in the frame-based interpolation mode.
In FIG. 2a:
Legend
P0 to P8: pixel value in full pixel position PA1 H1 to H8: pixel value in half pixel position PA1 H1=(P0+P1+P2+P4)/4 PA1 H2=(P0+P2)/2 PA1 H3=(P0+P2+P3+P5)/4 PA1 H4=(P0+P4)/2 PA1 H5=(P0+P5)/2 PA1 H6=(P0+P4+P6+P7)/4 PA1 H7=(P0+P7)/2 PA1 H8=(P0+P5+P7+P8)/4. In FIG. 2a, P0 to P8 are pixels of the adjacent frame in the full-pixel positions, i.e., they are the original non-interlaced pixels. In the frame-based interpolation mode, these pixels are treated without taking into account which field they come from. P0 is assumed to be the location pointed to by the full-pixel resolution motion vector (the full-pixel resolution vector is shown by the arrow formed of a solid line), and H1 to H8 the top left corners of the half-pixel resolution blocks (the corresponding half-pixel resolution vectors are shown by the arrows formed of a dashed line). One of these will give the best matching half-pixel resolution block. The best matching half-pixel resolution block is then compared with the block obtained using the full-pixel resolution motion vector (i.e., block with top left corner at P0), and the one that gives the best performance will be chosen as the block for motion compensation., As shown in FIG. 2a, depending on their locations, the half-pixel values are obtained by taking the average of the two adjacent pixels or the average of the four adjacent pixels. To form a half-pixel resolution block of size 8.times.8, 8 rows and 8 columns of the pixels need to be interpolated. PA1 E0 to E8: pixel value in full pixel position (from even field) PA1 O0 to O8: pixel value in full pixel position (from odd field) PA1 HE1 to HE8: pixel value in half pixel position (for even field) PA1 HO1 to HO8: pixel value in half pixel position (for odd field) PA1 HE1=(E0+E1+E2+E4))/4 PA1 HE2=(E0+E2)/2 PA1 HE3=(E0+E2+E3+E5)/4 PA1 HE4=(E0+E4)/2 PA1 HE5=(E0+E5)/2 PA1 HE6=(E0+E4+E6+E7)/4 PA1 HE7=(E0+E7)/2 PA1 HE8=(E0+E5+E7+E8)/4 PA1 HO1=(O0+O1+O2+O4)/4 PA1 HO2=(O0+O2 )/2 PA1 HO3=(O0+O2+O3+O5)/4 PA1 HO4=(O0+O4)/2 PA1 HO5=(O0+O5)/2 PA1 HO6=(O0+O4+O6+O7)/4 PA1 HO7=(O0+O7)/2 PA1 HO8=(O0+O5+O7+O8)/4. In FIG. 2b, E0 to E8 are pixels of the adjacent frame coming from the even field. If the full-pixel resolution motion vector points to the even field, for example E0, the half-pixel resolution motion vector will be obtained from one of the positions HE1 to HE8. Similar to the frame-based interpolation mode, HE1 to HE8 are also the top left corners of the half-pixel resolution blocks. Each half-pixel resolution block is formed with reference to the field pixel positions by taking the average of the two or four pixel values from the same field. The best matching half-pixel resolution block is then compared with the block obtained using the full-pixel resolution motion vector (i.e., block with top left corner at E0), and the one that gives the best performance will be chosen as the block for motion compensation. The same procedure is applied if the full-pixel resolution motion vector points to the odd field, O0.
Frame-based interpolation formulae
FIG. 2b shows the field-based interpolation mode.
In FIG. 2b:
Legend
Field-based interpolation formulae
Due to a time difference between the two fields of the frame, motion of the objects in an interlaced sequence will cause the objects to be displaced from one field to the other. In such a case, the field-based interpolation mode will give a better motion estimation with half pixel accuracy compared with the frame-based interpolation mode.
In order to determine the best half-pixel resolution motion vector for the coded block in the motion estimation process, both the frame-based and field-based interpolations need to be performed to obtain all half-pixel resolution blocks, and followed by a comparison based on some pre-defined criteria, such as mean absolute error, to determine which mode and which half-pixel position would give a minimum error for the coded block. With this method, the information of which of the frame interpolation mode and the field interpolation mode is being selected has to be transmitted to the decoder. An example of the implementation of the above method is described in the document "Proposal Package," ISO-IEC/JTC1/SC29/WG11, MPEG 91/228, 1991.
The above-described method of adaptive frame/field interpolation for sub-pixel motion estimation has several problems. In order to obtain the sub-pixel resolution motion vector, sub-pixel resolution blocks of frame-based interpolation and also field-based interpolation need to be obtained for comparison and this is costly in hardware implementation. Secondly, to use this method, extra information (bit) needs to be transmitted to the decoder for each coded block so that the decoder uses the corresponding correct frame or field interpolation mode for sub-pixel motion compensation. With frame/field sub-pixel resolution block comparison using mean absolute error criteria, a visually smooth sub-pixel motion estimation was found difficult in some cases.