1. Field of the Invention
The present invention relates to the field of data compression and, more particularly, to a system and techniques for compressing and decompressing digital motion video signals at a multiplicity of scales. The techniques expand on algorithms similar to the emerging MPEG standard proposed by the International Standards Organization's Moving Picture Experts Group (MPEG).
2. Environment
Technological advances in digital transmission networks, digital storage media, Very Large Scale Integration devices, and digital processing of video and audio signals are converging to make the transmission and storage of digital video economical in a wide variety of applications. Because the storage and transmission of digital video signals is central to many applications, and because an uncompressed representation of a video signal requires a large amount of storage, the use of digital video compression techniques is vital to this advancing art. In this regard, several international standards for the compression of digital video signals have emerged over the past decade, with more currently under development. These standards apply to algorithms for the transmission and storage of compressed digital video in a variety of applications, including video-telephony and teleconferencing; high quality digital television transmission on coaxial and fiber-optic networks as well as broadcast terrestrially and over direct broadcast satellites; and in interactive multimedia products on CD-ROM, Digital Audio Tape, and Winchester disk drives.
Several of these standards involve algorithms based on a common core of compression techniques, e.g., the CCITT (Consultative Committee on International Telegraphy and Telephony) Recommendation H.120, the CCITT Recommendation H.261, and the ISO/IEC MPEG standard. The MPEG algorithm, has been developed by the Moving Picture Experts Group (MPEG), part of a joint technical committee of the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC). The MPEG committee has been developing a draft standard for the multiplexed, compressed representation of video and associated audio signals. The standard specifies the syntax of the compressed bit stream and a method of decoding a digital video signal at one level of spatial resolution. This draft standard will be referred to as the MPEG-1 standard or algorithm, in order to distinguish it from newer algorithms now under discussion by the same committee. The MPEG-1 draft standard is described in document ISO/IEC JTC1/SC2 WG11 MPEG 91/090 of May 1991.
As the present invention may be applied to extend the functions of an MPEG-1 decoder to produce a multiplicity of video resolutions from the same compressed bit stream, some pertinent aspects of the MPEG-1 video compression algorithm will be reviewed. It is to be noted, however, that the invention can also be applied to other video coding algorithms which share some of the features of the MPEG algorithm.
The MPEG-1 Video Compression Algorithm
To begin with, it will be understood that the compression of any data object, such as a page of text, an image, a segment of speech, or a video sequence, can be thought of as a series of steps, including: 1) a decomposition of that object into a collection of tokens; 2) the representation of those tokens by binary strings which have minimal length in some sense; and 3) the concatenation of the strings in a well-defined order. Steps 2 and 3 are lossless, i.e., the original data is faithfully recoverable upon reversal, and Step 2 is known as entropy coding. (See, e.g., T. BERGER, Rate Distortion Theory, Englewood Cliffs, N.J.: Prentice-Hall, 1977; R. McELIECE, The Theory of Information and Coding, Reading, Mass.: Addison-Wesley, 1971; D. A. HUFFMAN, "A Method for the Construction of Minimum Redundancy Codes," Proc. IRE, pp. 1098-1101, September 1952; G. G. LANGDON, "An Introduction to Arithmetic Coding," IBM J. Res. Develop., vol. 28, pp. 135-149, March 1984) Step 1 can be either lossless or lossy in general. Most video compression algorithms are lossy. A successful lossy compression algorithm eliminates redundant and irrelevant information, allowing relatively large errors where they are not likely to be visually significant and carefully representing aspects of a sequence to which the human observer is very sensitive. The techniques employed in the MPEG-1 algorithm for Step 1 can be described as predictive/interpolative motion-compensated hybrid DCT/DPCM coding. Huffman coding, also known as variable length coding (see the above cited Huffman 1952 paper) is used in Step 2. Although, as mentioned, the MPEG-1 standard is really a specification of the decoder and the compressed bit stream syntax, the following description of the MPEG-1 specification is, for ease of presentation, primarily from an encoder point of view.
The MPEG-1 video standard specifies a coded representation of video for digital storage media, as set forth in ISO-IEC JTC1/SC2/WG11 MPEG CD-11172, MPEG Committee Draft, 1991. The algorithm is designed to operate on non-interlaced component video, although it can be extended to operate with interlace video by combining two consecutive interlaced fields into a single picture. Each picture has three components: luminance (Y), red color difference (C.sub.r), and blue color difference (C.sub.b). The C.sub.r and C.sub.b components each have half as many samples as the Y component in both horizontal and vertical directions. Further, the algorithm operates with a single level of video resolution.
Layered Structure of all MPEG-1 Sequence
An MPEG-1 data stream consists of a video stream and an audio stream which are packed, together with systems information and possibly other bitstreams, into a systems data stream that can be regarded as layered. Within the video layer of the MPEG-1 data stream, the compressed data is further layered. The highest layer is the Video Sequence Layer, containing control information and parameters for the entire sequence. A description of the organization of the other layers will aid in understanding the invention. These layers of the MPEG-1 Video Layered Structure, are shown in FIGS. 1-4. Specifically the Figures show:
FIG. 1: Groups of Pictures (GOPs).
FIG. 2: Macroblock (MB) subdivision of a picture.
FIG. 3: Slice subdivision of a picture (example).
FIG. 4: Block subdivision of a macroblock.
The layers pertain to the operation of the compression algorithm as well as the composition of a compressed bit stream. As noted, the highest layer is the Video Sequence Layer, containing control information and parameters for the entire sequence. At the next layer, a sequence is subdivided into sets of consecutive pictures, each known as a Group of Pictures (GOP). A general illustration of this layer is shown in FIG. 1. Decoding may begin at the start of any GOP, essentially independent of the preceding GOPs. There is no limit to the number of pictures which may be in a GOP, nor do there have to be equal numbers of pictures in all GOPs.
The third or Picture layer is a single picture. A general illustration of this layer is shown in FIG. 2. Decoding may begin at the start of any GOP; essentially the luminance component of each picture is subdivided into 16.times.16 regions and the color difference components are subdivided into 8.times.8 regions spatially co-sited with the 16.times.16 luminance regions. Taken together, the co-sited luminance region and color difference regions make up the fifth layer, known as a macroblock (MB).
Between the Picture and MB layers is the fourth or slice layer. Each slice consists of an arbitrary or optional number of consecutive MB's. Slices need not be uniform in size within a picture or from picture to picture. They may be only a few macroblocks in size or extend across multiple rows of MB's as shown in FIG. 3.
An MB is a fundamental layer to which various attributes can be associated as will be seen below. The basic structure of an MB consists of four luminance blocks and two chrominance blocks as seen in FIG. 4. All of these blocks are of size 8.times.8 in MPEG-1. Preserving the structure and attributes of an MB (not necessarily its size) across a multiplicity of picture resolutions is one of the goals of this invention.
Within a GOP, three types of pictures can appear. The distinguishing difference among the picture types is the compression method used. Intramode pictures or I-pictures are compressed independently of any other picture. Although there is no fixed upper bound on the distance between I-pictures, it is expected that they will be interspersed frequently throughout a sequence to facilitate random access and other special modes of operation. Each GOP must start with an I-picture and additional I-pictures can appear within the GOP. The other types of pictures, predictively motion-compensated pictures (P-pictures) and bidirectionally motion-compensated pictures (B-pictures), will be described in the discussion on motion compensation below. A general illustration is shown in FIG. 5.
Motion Compensation
Most video sequences exhibit a high degree of correlation between consecutive pictures. A useful method to remove this redundancy prior to coding a picture is "motion compensation". Motion compensation requires some means for modeling and estimating the motion in a scene. In MPEG-1, each picture is partitioned into macroblocks and each MB is compared to 16.times.16 regions in the same general spatial location in a predicting picture or pictures. The region in the predicting picture(s) that best matches the MB in some sense is used as the prediction. The difference between the spatial location of the MB and that of it's predictor is referred to as a motion vector. Thus, the outputs of the motion estimation and compensation for an MB are motion vectors and a motion-compensated difference macroblock. These can generally be compressed more than the original MB itself. Pictures which are predictively motion-compensated using a single predicting picture in the past, i.e., forward-in-time in the sequence, are known as P-pictures.
In MPEG-1, the time interval between a P-picture and its predicting picture can be greater than one picture interval. For pictures that fall between P-pictures or between an I-picture and a P-picture, backward-in-time prediction may be used in addition to forward-in-time prediction. Such pictures are known as bidirectionally motion-compensated pictures (B-pictures). For B-pictures, in addition to forward and backward prediction, interpolative motion compensation is allowed in which the predictor is an average of a block from the previous predicting picture and a block from the future predicting picture. In this case, two motion vectors are needed.
The use of bidirectional motion compensation leads to a two-level motion compensation structure, as depicted in FIG. 5. Each arrow indicates the prediction of the picture touching the arrowhead using the picture touching the clot. Each P-picture is motion-compensated using the previous P-picture (or I-picture, as the case may be). Each B-picture is motion-compensated by the P- or I-pictures immediately before and after it. These predicting pictures are sometimes referred to as "anchor" pictures. No limit is specified in MPEG-1 on the distance between anchor pictures, nor on the distance between I-pictures. In fact, these parameters do not have to be constant over an entire sequence. Referring to the distance between I-pictures as N and to the distance between P-pictures as M, the sequence shown in FIG. 5 has (N,M)=(9,3).
It should therefore be understood that an MPEG-1 sequence consists of a series of I-pictures which may have none or one or more P-pictures sandwiched between them. The various I- and P-pictures may have no B-pictures or one or more B-pictures sandwiched between then, in which latter event they are anchor pictures.
Transformation and quantization of an MB
One very useful image compression technique is transform coding. (See N. S. JAYANT and P. NOLL, Digital Coding of Waveforms, Principles and Applications to Speech and Video, Englewood Cliffs, N.J.: Prentice-Hall, 1984, and A. G. TESCHER, "Transform Image Coding," W. K. Pratt, editor, Image Transmission Techniques, pp. 113-155, New York, N.Y.: Academic Press, 1979.) In MPEG-1 and several other compression standards, the discrete cosine transform (DCT) is the transform of choice. (See K. R. RAO and P. YIP, Discrete Cosine Transform, Algorithms, Advantages, Applications, San Diego, Calif.: Academic Press, 1990, and N. AHMED, T. NATARAJAN, and K. R. RAO, "Discrete Cosine Transform," IEEE Transactions on Computers, pp. 90-93, January 1974.) The compression of an I-picture, for example, is achieved by taking the DCT of the blocks of luminance and chrominance pixels within a MB, quantizing the DCT coefficients, and Huffman coding the result. Similar principles apply to the compression of P- and B-pictures except that, in these cases, the DCT may be applied to the difference between the blocks of pixels within an MB and their corresponding motion-compensated-prediction. The DCT converts a block of n.times.n pixels into an n.times.n set of transform coefficients. The DCT is very useful in compression applications, because it lends to concentrate the energy of the block of pixel data into a few of the DCT coefficients, and further, the DCT coefficients are nearly independent of each other. Like several of the international compression standards, the MPEG-1 algorithm uses a DCT block size of 8.times.8, which corresponds to the size of the blocks within an MB. It is one purpose of this invention to use DCTs of larger and smaller sizes so as to scale the size of an MB thus supporting pictures of multiple resolutions.
The next step is quantization of the DCT coefficients, which is the primary source of lossiness in the MPEG-1 algorithm. Denoting the elements of the two-dimensional array of DCT coefficients by c.sub.mn, where m and n can range from 0 to 7, aside from truncation or rounding corrections, quantization is achieved by dividing each DCT coefficient c.sub.mn by w.sub.mn .times.OP, with w.sub.mn being a weighting factor and QP being the quantizer parameter. The weighting factor w.sub.mn allows coarser quantization to be applied to the less visually significant coefficients. There can be two sets of these weights, one for I-pictures and the other for P- and B-pictures. Custom weights may be transmitted in the video sequence layer. The quantizer parameter QP is the primary means of trading off quality vs. bit-rate in MPEG-1. It is important to note that QP can vary from MB to MB within a picture. It is also important to note that in this invention it is possible to choose either to provide separate weight matrices for the DCTs of other sizes, or to provide weight matrices of different sizes which are mathematically related so as to facilitate decoder processing.
Following quantization, the DCT coefficient information for each MB is organized and coded, using a set of Huffman codes. The details of this step are not essential to an understanding of the invention so that no description will be given, but for further information thereon reference may be had to the previously-cited HUFFMAN 1952 paper.
Macroblock Attributes due to Motion Compensation
It will be appreciated that there are three kinds of motion compensation which may be applied to MB's: forward, backward, and interpolative. The encoder must select one of these modes. For some MBs, none of the motion compensation modes yields an accurate prediction. In such cases, the MB may be selected for intramode coding as with I-pictures. Thus, depending on the motion compensation mode, MBs can be of the following types:
forward PA1 backward PA1 interpolative PA1 intra PA1 "Subband Image Coding," J. W. WOODS, editor, Kluwer Academic Publishers, 1991. PA1 "Digital Image Compression Techniques," M. RABBANI and P. JONES, SPIE Optical Engineering, Bellingham, Wash., USA, 1991. PA1 "Alternative to the hierarchical scheme," Ch. GUILLEMOT, T. N'GUYEN, and A. LEGER, ISO/JTC1/SC2/WG8, JPEG N-260, February 1989. PA1 "Setup of CCIR 601 multi-purpose coding scheme," PTT RESEARCH, the Netherlands, ISO/IEC JTC1/SC2/WG11 MPEG 91/051, May 1991. PA1 "Compatible Coding of CCIR 601 images: Predict the prediction error," PTT RESEARCH, the Netherlands, ISO/IEC JTC1/SC2/WG 11 MPEG 91/114, August 1991.
Also in P-pictures, depending on the value of the motion vector, MBs can be either of the type with motion vector zero or of the non-zero type. These types together with the required motion vector data are coded with every MB as overhead data. The exceptions are skipped MBs, as will be explained below.
Macroblock Attributes due to Transformation and Quantization
As discussed previously, the QP parameter can be changed on an MB to MB basis. When this change takes place additional MB types are used to indicate that a new QP should be used. The new QP value itself is transmitted together with the MB.
After applying the DCT and quantization to the blocks within an MB, it may result that some of the blocks contain only zeros. These blocks do not require further data to be coded and are signalled by a, so called, coded block pattern code. This code represents additional overhead.
Finally whenever MBs contain no additional new information, they call also be skipped. To convey this information an MB address is also transmitted together with every non-skipped MB.
It should be noted then that MBs carry with them a series of attributes that need to be described by including overhead data with each coded MB. It is one object of this invention to preserve the identity of MBs across a multiplicity of scales such that the overhead is included only once, except perhaps for the refinement of some parameters such as the accuracy of the motion vectors.