1. Field of the Invention
The present invention relates to processing and storage of compressed visual data, and in particular the processing and storage of compressed visual data for bit rate reduction.
2. Background Art
It has become common practice to compress audio/visual data in order to reduce the capacity and bandwidth requirements for storage and transmission. One of the most popular audio/video compression techniques is MPEG. MPEG is an acronym for the Moving Picture Experts Group, which was set up by the International Standards Organization (ISO) to work on compression. MPEG provides a number of different variations (MPEG-1, MPEG-2, etc.) to suit different bandwidth and quality constraints. MPEG-2, for example, is especially suited to the storage and transmission of broadcast quality television programs.
For the video data, MPEG provides a high degree of compression (up to 200:1) by encoding 8×8 blocks of pixels into a set of discrete cosine transform (DCT) coefficients, quantizing and encoding the coefficients, and using motion compensation techniques to encode most video frames as predictions from or between other frames. In particular, the encoded MPEG video stream is comprised of a series of groups of pictures (GOPs), and each GOP begins with an independently encoded (intra) I frame and may include one or more following P frames and B frames. Each I frame can be decoded without information from any preceding and/or following frame. Decoding of a P frame requires information from a preceding frame in the GOP. Decoding of a B frame requires information from both a preceding and a following frame in the GOP. To minimize decoder buffer requirements, transmission orders differ from presentation orders for some frames, so that all the information of the other frames required for decoding a B frame will arrive at the decoder before the B frame.
In addition to the motion compensation techniques for video compression, the MPEG standard provides a generic framework for combining one or more elementary streams of digital video and audio, as well as system data, into single or multiple program transport streams (TS) which are suitable for storage or transmission. The system data includes information about synchronization, random access, management of buffers to prevent overflow and underflow, and time stamps for video frames and audio packetized elementary stream packets embedded in video and audio elementary streams as well as program description, conditional access and network related information carried in other independent elementary streams. The standard specifies the organization of the elementary streams and the transport streams, and imposes constraints to enable synchronized decoding from the audio and video decoding buffers under various conditions.
The MPEG-2 standard is documented in ISO/IEC International Standard (IS) 13818-1, “Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Systems,” ISO/IEC IS 13818-2, “Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Video,” and ISO/IEC IS 13818-3, “Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Audio,” which are incorporated herein by reference. A concise introduction to MPEG is given in “A Guide to MPEG Fundamentals and Protocol Analysis (Including DVB and ATSC),” Tektronix Inc., 1997, incorporated herein by reference.
MPEG-2 provides several optional techniques that allow video coding to be performed in such a way that the coded MPEG-2 stream can be decoded at more than one quality simultaneously. In this context, the word “quality” refers collectively to features of a video signal such as spatial resolution, frame rate, and signal-to-noise ratio (SNR) with respect to the original uncompressed video signal. These optional techniques are known as MPEG-2 scalability techniques. In the absence of the optional coding for such a scalability technique, the coded MPEG-2 stream is said to be nonscalable. The MPEG-2 scalability techniques are varieties of layered or hierarchical coding techniques, because the scalable coded MPEG-2 stream includes a base layer that can be decoded to provide low quality video, and one or more enhancement layers that can be decoded to provide additional information that can be used to enhance the quality of the video information decoded from the base layer. Such a layered coding approach is an improvement over a simulcast approach in which a coded bit stream for a low quality video is transmitted simultaneously with an independently coded bit stream for high quality video. The use of video information decoded from the base layer for reconstructing the high quality video permits the scalable coded MPEG-2 stream to have a reduced bit rate and data storage requirement than a comparable simulcast data stream.
The MPEG-2 scalability techniques are useful for addressing a variety of applications, some of which do not need the high quality video that can be decoded from a nonscalable coded MPEG stream. For example, applications such as video conferencing, video database browsing, and windowed video on computer workstations do not need the high quality provided by a nonscalable coded MPEG-2 stream. For applications where the high quality video is not needed, the ability to receive, store, and decode an MPEG-2 base-layer stream having a reduced bit rate or data storage capacity may provide a more efficient bandwidth versus quality tradeoff, and a more efficient complexity versus quality tradeoff. A scalable coded MPEG-2 stream provides compatibility for a variety of decoders and services. For example, a reduced complexity decoder for standard television could decode a scalable coded MPEG-2 stream produced for high definition television. Moreover, the base layer can be coded for enhanced error resilience and can provide video at reduced-quality when the error rate is high enough to preclude decoding at high quality.
The MPEG scaling techniques are set out in sections 7.7 to 7.11 of the MPEG-2 standard video encoding chapter 13818-2. They are further explained in Barry G. Haskell et al., Digital Video: An Introduction to MPEG-2, Chapter 9, entitled “MPEG -2 Scalability Techniques,” pp. 183-229, Chapman & Hall, International Thomson Publishing, New York, N.Y., 1997, incorporated herein by reference. The MPEG scalability techniques include four basic techniques, and a hybrid technique that combines at least two of the four basic techniques. The four basic techniques are called data partitioning, signal-to-noise ratio (SNR) scalability, spatial scalability, and temporal scalability.
The conventional MPEG scalability techniques permit transmission of the coded video to be switched from a high quality, high bit rate stream to a low quality, low bit rate stream when transmission of the high quality, high bit rate stream is either precluded by network congestion or is not needed at the destination of the stream. However, the conventional MPEG scalability techniques do not permit the bit rate reduction to be freely selected. For many applications, the required bit rate is intermediate that of the high quality, high bit rate stream, and the low quality, low bit rate stream provided by the conventional MPEG scalability techniques. In some applications, the required bit rate will fluctuate between a high rate and a low rate. In any case, it is desired to create a valid MPEG data stream with the best video quality given the required bit rate.
Much research has been done addressing the problem of encoding a valid MPEG data stream given a required constant or variable bit rate. In general, to satisfy the MPEG rate control requirements, all of the data for each picture must be within the video buffer at the instant it is needed by the decoder. This requirement usually translates to upper and lower bounds on the number of bits allowed in each picture. For example, a number of bits are allocated to each picture based on the picture type, and the bits for each picture are allocated to 8×8 blocks in each picture based on a measure of local coding complexity in each picture. A quantization scale is selected for each 8×8 block to encode the 8×8 block with the number of bits allocated to the block. See Gonzales et al., U.S. Pat. No. 5,231,484 issued Jul. 27, 1993, on Motion Video Compression System with Adaptive Bit Allocation and Quantization, and Ramamurthy et al., U.S. Pat. No. 5,675,384, issued Oct. 7, 1997.
Unfortunately, when the bit rate was reduced to a small fraction of the standard bit rate, the bit rate control provided by allocating bits based on picture type, and selecting a quantization scale for each block, became rather imprecise. Sometimes the poor bit rate control was compensated for by liberal use of stuffing to make up for a difference between the allocated number of bits and the actual number of bits for each 8×8 block. For low bit rates, however, the stuffing represents a waste of information capacity resulting in a significant loss of picture quality. Efforts toward improving the bit rate control has focused on more sophisticated methods of estimating the complexity of image segments in comparison to average complexity in order to more precisely allocate bits to the image segments to obtain a rather constant visual picture quality and thus minimal degradation in picture quality. One solution was to perform two-pass encoding, which cannot be performed in real time. In a first pass, the video sequence is encoded with constant-bit-rate (CBR) encoding, while statistics concerning coding complexity are gathered. Next, the first pass data is processed to prepare control parameters for the second pass, which performs the actual VBR compression. See Westerink et al., “Two pass MPEG-2 variable-bit-rate encoding,” IBM J. Res. Develop., Vol. 43, No. 4, July 1999, pp. 471-488. Real-time, single-pass encoding has also been proposed which is said to adapt to the complexity of the image segments. However, substantial processing resources are required for good performance from a single-pass encoder. See Mohsenian et al., “Single-pass constant- and variable-bit-rate MPEG-2 video compression,” IBM J. Res. Develop., Vol. 43, No. 4, July 1999, pp. 489-509.