The MPEG-1 video standard defines three different types of frames (see FIG. 1A): Intra-coded frames (I frames), Predicted frames (P frames), and Bidirectionally predicted frames (B frames). P and B frames are also said to be inter-coded because they are coded based on prior anchor or reference frames. Details of the MPEG-1 standard can be found in ISO/IEC JTC1 CD 11172 Coding of moving pictures and associated audio for digital storage media up to 1.5 Mbits/s, 1992. This reference is incorporated by reference in its entirety. I frames are compressed using intraframe coding, i.e., they do not reference any other frames in the coded streams. Each I frame is divided into 8.times.8 blocks (typically 112) and a Discrete Cosine Transform (DCT) is taken. Of the 64 DCT coefficients, the DC, AC01 and AC10 are of particular interests. They are illustrated in FIG. 1B. Predicted frames are coded using a motion compensation information from past I frames or P frames. B frames are coded using motion compensation information from either past and/or future I or P frames. P and B frames can also contain intra-coded blocks, which are also coded using DCT as are the blocks in the I frames.
A MPEG-1 bit stream has a hierarchical representation with six layers: a sequence layer, a group of pictures layer, a picture layer, a slice layer, a macroblock layer, and a block layer. The sequence layer is the top coding layer. A group of pictures layer (GOP) is a set of pictures in contiguous display order. It contains at least one I frame. Each picture corresponds to a frame. Each picture is divided into slices. A slice is divided into macroblocks. A macroblock is composed of 6 blocks 112, four blocks for luminance, and two for chrominances. Further details of MPEG-1 standard can be found in ISO/IEC JTC1 CD 11172 Coding of moving pictures and associated audio for digital storage media up to 1.5 Mbits/s, 1992.
The details of a process of reduced image generation can be found in "On the extraction of DC sequences from MPEG compressed video", (by B. L. Yeo and B. Liu, International Conference on Image Processing, Vol. II, pp. 260-263, 1995). This reference is incorporated by reference in its entirety. A DC image is composed of block-wise averages of 8.times.8 blocks. For the I-frame of an MPEG-1 coded video, each pixel in the DC image corresponds to a scaled version of the DC coefficient of each DCT block.
A generic extraction of the DC images from P and B frames is shown in FIG. 1C. Here, P.sub.ref 131D is the target block of interest and is inter-coded. It is desired to convert it into an intra-coded DCT format so that the DC coefficient can be extracted. P.sub.0, . . . , P.sub.3 (134, 135, 136 and 137) are the four original intra-coded neighboring blocks (called anchor blocks) from which P.sub.ref is derived and the motion vector is (.DELTA.x, .DELTA.y). The motion vector (.DELTA.x, .DELTA.y) 150 is part of the motion compensation information associated with the block P.sub.ref. The shaded regions 134P, 135P, 136P and 137P in the anchor blocks P.sub.0, . . . P.sub.3, called the contributing subblocks, are moved by (.DELTA.x, .DELTA.y). Thus, the DC coefficients of P.sub.ref are derived for DC image sequences.
The relation of inter-coded P.sub.ref with respect to P.sub.i is derived in "Manipulation and compositing of MC-DCT compressed video", (by S. F. Chang and D. G. Messerschmitt, IEEE Journal on Selected Areas in Communications: Special Issue on Intelligent Signal Processing, vol. 13, pp. 1-11, January 1995). This reference is incorporated by reference in its entirety. If we represent each block as an 8.times.8 matrix, then we can describe in the spatial domain through matrix multiplications: ##EQU1## where S.sub.ij are matrices like ##EQU2## Each I.sub.n is an identity matrix of size n. An example of such an effect of S.sub.ij on P is demonstrated in FIG. 1D. The goal is to move the subblock from the upper left corner to the lower right corner, and also to set the remaining values in the blocks to be zeros. The pre-multiplication shifts the sub-block of interest vertically while post-multiplication shifts the sub-block horizontally. There are four possible locations of the subblock of interest: upper-left, upper-right, lower-right and lower-left. The actions in terms of matrices are tabulated in Table 1.
While the value of S.sub.ij is clear from Table 1 with the given values of h.sub.i and w.sub.i, we will sometime write S.sub.ij as a function of h.sub.i and w.sub.i. For example, S.sub.01 =S.sub.01 (h.sub.0, w.sub.0)=S.sub.01 (h.sub.0).
Denoting the 2D DCT of an 8.times.8 block P as DCT(P), we can express the DC coefficient of DCT(P.sub.ref) as: ##EQU3## for some weighting coefficients w.sub.ml.sup.i.
A key result of "On the extraction of DC sequences from MPEG compressed video", (by B. L. Yeo and B. Liu, International Conference on Image Processing, Vol. II, pp. 260-263, 1995) is that ##EQU4##
TABLE 1 ______________________________________ Matrices S.sub.i1 and S.sub.i2 Subblock Position S.sub.i1 S.sub.i2 ______________________________________ P.sub.0 lower right ##STR1## P.sub.1 lower left ##STR2## ##STR3## P.sub.2 upper right ##STR4## ##STR5## P.sub.3 upper left ##STR6## ##STR7## ______________________________________
i.e., the weight w.sub.00.sup.i is the ratio of overlaps of the block P.sub.ref with block P.sub.i. An approximation, called the first-order approximation, approximates (DCT(P.sub.ref)).sub.00 by ##EQU5## This approximation requires only the motion vector information and the DC values in the reference frames. It is shown in "On the extraction of DC sequences from MPEG compressed video", (by B. L. Yeo and B. Liu, International Conference on Image Processing, Vol. II, pp. 260-263, 1995) that such approximation, when applied to B and P frames, yields good results in practice.
The use of DC images extracted directly from MPEG-1 video has led to efficient algorithms for processing MPEG-1 video. Example enabling applications are documented in "Rapid scene analysis on compressed videos" (by B. L. Yeo and B. Liu, IEEE Transactions on Circuits and Systems For Video Technology, vol. 5, pp. 533-544, December 1995) and "Efficient Matching and Clustering of Video Shots" (by M. M. Yeung and B. Liu, International Conference on Image Processing, Vol. I, pp. 338-341, 1995). These references are incorporated by reference in their entirety.
For the I frame of a MPEG-1 video, we define a DC+2AC as follows: for each 8.times.8 DCT block, the DC and two lower order AC coefficients, AC.sub.0,1 and AC.sub.1,0 are retained, and a 2.times.2 inverse DCT is taken to obtain 4 pixels. For B and P frames, equation (2) is used for constructing the DC and two lower order AC coefficients of P.sub.ref from the DC and two order AC coefficients of DCT blocks P.sub.i 's. ##EQU6## for a+b.ltoreq.1. Here, h.sub.i and w.sub.i are the height and width of block P.sub.i (FIG. 1C) respectively. Further details can be found in "On Fast Microscopic Browsing of MPEG compressed video" (by B. L. Yeo, IBM T. J. Watson Research Center, Technical Report RC20841, May 1997). This reference is incorporated by reference in its entirety.
MPEG-2 video is intended for higher data rates and has uses for broadcasting with high-quality video, whereas MPEG-1 is intended for data rates on the order of 1.5 Mbit/s. MPEG-2 supports much broader range of applications and modes of operations. It maintains all of the MPEG-1 video syntax, and uses extensions for additional flexibility and functions. MPEG-2 also supports the coding of interlaced video and scalable data streams. MPEG-1 does not support interlaced video, only non-interlaced (progressive) video.
In interlaced video, each frame is comprised of two fields, a top field and a bottom field. In MPEG-2, each field could be coded as a separate picture (field picture) or as a complete frame (frame picture). Frame pictures and field pictures could be mixed in a MPEG-2 data stream. The DCT coefficients for field pictures are always organized as independent fields. However, in a frame picture, DCT can be done either on fields or frames on a macroblock basis. That is, a frame picture could have both frame-coded macroblocks and field-coded macroblocks.
FIGS. 1E, 1F and 1G show different formats of DCT coded macroblocks possible in an MPEG-2 video. FIG. 1E is a drawing showing a representation of prior art field DCT coding in a frame picture in the MPEG-2 video standard. FIG. 1F is a drawing showing a representation of prior art frame DCT coding in a frame picture in the MPEG-2 video standard. FIG. 1G is a drawing showing a representation of prior art a coded macroblock in a field picture in the MPEG-2 video standard.
Motion compensation (MC) in MPEG-2 is also more general than those in MPEG-1. There are two basic modes of motion compensations: field-based motion compensation and frame-based motion compensation. In field-based MC, two motion vectors are used to specify the translations of the top and bottom fields (the mode field-based MC and the two motion vectors constitute the motion compensation information for field-based MC). In frame-based MC, only one motion vector is used (the mode frame-based MC and the motion vector constitutes the motion compensation information for field-based MC). Frame-based MC is similar to what is done in MPEG-1. In field pictures, only field-based MC can be used, whereas in frame pictures, both frame-based and field-based MC are allowed.
The two cases are illustrated in FIG. 1H and FIG. 1I. FIG. 1H is a drawing showing prior art frame prediction in a frame picture or prediction in a field picture in the MPEG-2 video standard. FIG. 1I is a drawing showing prior art field prediction in a frame picture in the MPEG-2 video standard.
In summary, one difference between MPEG-2 and MPEG-1 is the support of coding of interlaced video through the use of DCT and/or motion compensation of the interlaced frames through either frame or field encoding.
To improve prediction efficiency, MPEG-2 further supports the 16.times.8 motion-compensation mode. This mode allows a macroblock to be treated as an upper 16.times.8 region and a lower 16.times.8 region. Each region is then independently motion-compensated. Another mode is the dual-prime motion prediction. It averages the predictions from two adjacent fields of opposite parity when coding a given field or frame. Dual-primal motion prediction tends to reduce the noise in the data.
Details of MPEG-2 standards can be found in ISO/IEC JTC1 CD 13818 Generic Coding of moving pictures and associated audio, 1994, MPEG Video Compression Standard (by J. L. Mitchell, W. B. Pennebaker, C. E. Foog and D. J. Le Gall, Chapman and Hall, 1996) and Digital Video: An Introduction to MPEG-2 (by B. G. Haskell, A. Puri, and A. N. Netravali, Chapman and Hall, 1997). These references are incorporated by reference in their entirety.
Notations
In this disclosure, the following notational conventions are used. We represent an 8.times.8 (16.times.16) data block (macroblock) as an 8.times.8 matrix (16.times.16 matrix), or vice versa, when there is no confusion. We denote spatial domain blocks or macroblocks (or matrices) with capital letters, (e.g., D, D', D.sub.i, P, etc) and the corresponding DCT domain blocks (or matrices) by the same capital letters with hats (e.g., D.sub.i, P.sub.i, etc), that is, for a 8.times.8 block represented by the matrix D, we have EQU D=DCT(D)=TDT.sup.t, (3)
where T is the 8.times.8 DCT matrix with entries t(i,j) (i denotes the i th row and j denotes the j th column) given by ##EQU7##
We also explicitly use DCT to denote the DCT domain value of a block when clearer notation is required. For example, DCT(AB) denotes the 2D-DCT applied to the matrix product AB. We will also make constant use of the distributive property of 2D DCT: EQU DCT(AB)=DCT(A)DCT(B)=AB. (6)
Thus, we can write the 2D-DCT of ##EQU8## for describing the movement of subblock in the spatial domain as ##EQU9## in the DCT domain. The conversion from inter-coded block P.sub.ref to intra-coded DCT block DCT(P.sub.ref) directly in the DCT domain using only the DCT values of P.sub.i 's and the motion vector (.DELTA.s, .DELTA.y) 150 is called DCT domain inverse motion compensation. For simplicity of notation, we will describe the movement of subblocks in the spatial domain in this disclosure; the DCT domain values of the blocks can be similarly deduced from equation (6).
A non-interlaced 16.times.16 macroblock is denoted by a simple capital letter (e.g., D) whereas the corresponding interlaced macroblock is denoted by a primed capital letter (e.g., D'). Given a non-interlaced macroblock D, we denote the four component 8.times.8 blocks by D.sub.0, D.sub.1, D.sub.2, and D.sub.3, i.e., ##EQU10## Similarly, for the corresponding interlaced macroblock D', its four component blocks are denoted by D'.sub.0, D'.sub.1, D'.sub.2, and D'.sub.3, i.e., ##EQU11## Each component in a block (matrix) is referenced by two indices, i.e., A.sub.ij represents the component at row i and column j in a block (matrix) A. To represent a row or column vector of a matrix, we use a --, i.e., A.sub.i-- represents the ith row vector and A.sub.--i the ith column vector.
Consider a non-interaced 16.times.16 macroblock ##EQU12## where D.sub.i, i=0, 1, 2, 3, is an 8.times.8 block. Similarly, let ##EQU13## be the corresponding interlaced macroblock, i.e., D'.sub.0 and D'.sub.1 correspond to the top field of D, and D'.sub.2 and D'.sub.3 correspond to the bottom field of D. The relationship between D and D' can be described using a 16.times.16 permutation matrix P as EQU D=PD' (7) ##EQU14##