Technological advances in digital transmission networks, digital storage media, Very Large Scale Integration devices, and digital processing of video and audio signals are converging to make the transmission and storage of digital video economical in many applications. Because the storage and transmission of digital video signals are central to many applications, and because an uncompressed representation of a video signal typically requires a large amount of storage, the use of digital video compression techniques is important to this advancing art.
Several international standards for the compression of digital video signals have emerged over the past decade, with more currently under development. These standards apply to algorithms for the transmission and storage of compressed digital video in a variety of applications, including: video-telephony and teleconferencing; high-quality digital television transmission via coaxial networks, fiber-optic networks, terrestrial broadcast or direct satellite broadcast; and in interactive multimedia products stored on CD-ROM, Digital Tape, Digital Video Disk, and disk drives.
Several of the compression standards involve algorithms based on a common core of compression techniques, e.g., the CCITT (Consultative Committee on International Telegraphy and Telephony) Recommendation H.120, the CCITT Recommendation H.261, and the ISO/IEC MPEG-1 and MPEG-2 standards. The MPEG algorithms were developed by the Moving Picture Experts Group (MPEG), part of a joint technical committee of the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC). The MPEG committee has been developing standards for the multiplexed, compressed representation of video and associated audio signals. The standards specify the syntax of the compressed bit stream and the method of decoding, but leave considerable latitude for novelty and variety in the algorithm employed in the encoder.
The MPEG-1 standard was developed for use in compressing progressive video. A progressive video sequence is a sequence in which each frame represents a scene as it is viewed at a discrete time instance. By contrast, for interlaced video, a field--every other line on the screen--is captured periodically. For interlaced video, at alternating time instances the top and bottom field on a screen is refreshed. At any given time, data from two fields (a frame) can be seen.
The MPEG-2 standard, can be used to compress either interlaced video, progressive video, or a mixture of progressive and interlaced video: the encoder specifies whether each frame is progressive or interlaced. The MPEG standards specify a bit stream in which the number of bits in the compressed representation of each picture is variable. This variation is due to the different types of picture processing, as well as the inherent variation with time of the spatio-temporal complexity of the scene being coded. This leads to the use of buffers to even out the fluctuations in bit rate. For a constant-bit-rate storage media or transmission channel, for example, buffering allows the bit rate of the compressed pictures to vary within limits that depend on the size of the buffers, while outputting a constant bit rate to the storage device or transmission channel.
Considering the importance of buffering, the MPEG standards define a hypothetical decoder called the Virtual Buffer Verifier (VBV), diagramed in FIG. 1, that verifies whether an encoded bit stream is decodable with specified limitations on the decoder buffer size and the input bit rate. The VBV has two modes of operation: constant bit rate (CBR) and variable bit rate (VBR). The two modes are described below.
For constant-bit-rate operation, the Decoder Buffer 101 is filled at a constant bit rate with compressed data 100 from the storage or transmission medium. Both the buffer size and the bit rate are parameters that are transmitted in the compressed bit stream. After an initial delay, which is also derived from information in the bit stream, a hypothetical decoder 103 instantaneously removes from the buffer all of the data associated with the first picture. Thereafter, at intervals equal to the picture rate of the sequence, the decoder removes all data associated with the earliest picture in the buffer.
Variable-bit-rate operation is similar to the above, except that the compressed bit stream enters the buffer at a specified maximum bit rate until the buffer is full, at which point no more bits are input until the buffer at least partially empties. This translates to a bit rate entering the buffer that is effectively variable.
In order for the bit stream to satisfy the MPEG rate-control requirements, it is necessary that all the data for each picture be available within the buffer at the instant it is needed by the decoder. This requirement translates to upper and lower bounds (UVBV and LVBV) on the number of bits allowed in each picture. The upper and lower bounds for a given picture depend on the number of bits used in all the pictures preceding it. It is the function of the encoder to produce bit streams that satisfy the VBV requirements. It is not expected that actual decoders will necessarily be configured or operate in the manner described above. The hypothetical decoder and its associated buffer are simply a means of placing computable limits on the size of compressed pictures.
A rate control scheme can be found in U.S. Pat. No. 5,231,484 to Gonzales and Viscito, which describes a rate control mechanism that can be used for MPEG. A block diagram for this type of scheme is shown in FIG. 2. In this scheme, the input video signal Fk 200 is sent to a Complexity Estimator 201 and a Picture Coder 205. The Complexity Estimator sends a complexity estimate Ck (signal 202) to a Picture Bit Allocator 203. The Picture Bit Allocator sends the quantization scale Qk (signal 204) to the Picture Coder 205. The quantization scale is set depending on the instantaneous buffer fullness of a hypothetical decoder buffer which will be receiving the compressed video signals from the encoder and the complexity of the previously encoded pictures. The Picture Coder uses the quantization scale to encode Fk and produce an output bit stream CDk (signal 206).