As is well known in the prior art, a motion picture is a temporal composite of a series of still images that are projected, one after another, so quickly that the mind does not perceive the discrete images but blends them into a coherent moving image. This is true whether the motion picture is transmitted and stored electrically as a video signal or optically on reels of acetate film.
When a motion picture is transmitted digitally, each still image, or "frame," is typically processed and transmitted individually by the video processing system. For example, FIG. 1 depicts a series of four frames of a longer sequence that depicts a person waving.
Each frame comprises a two-dimensional array of tessellated picture elements, or "pixels," which the mind perceives not as individual tiles, but as a mosaic. In a typical video processing system, a frame such as frame 102 in FIG. 1 could constitute an array of 512 by 512 pixels. Depending on the particulars of the system, each pixel can be either black or white, one of a number of shades of gray, or one of a number of colors. Typically, when each pixel can be one of 2.sup.n colors, where n is a whole number, the color of each pixel is represented by n bits. Therefore, an 8-bit color video system comprising 262,144 pixels per frame nominally requires 2,097,152 bits of storage per frame.
When it is cumbersome or computationally complex to process an entire frame as a whole, the frame is often treated as an array of independent blocks, which each have a size that is more convenient for the video processing system to handle. FIG. 2 depicts frame 102 of FIG. 1, which is treated as a 32 by 32 array of blocks, in well-known fashion. Each block, therefore, comprises an array of 16 by 16 pixels.
A typical video processing system projects 24 frames per second and, therefore, the video image of FIG. 1 nominally requires 50,331,648 bits per second. When, therefore, such a video image is stored on a medium (e.g., a Digital Video Disk, semiconductor RAM, etc.) or transferred over a telecommunications channel (e.g., a Plain Old Telephone Service telephone line, an IS-95A CDMA wireless telecommunications channel, etc.), such an image can demand considerable bandwidth, even by today's standards.
To reduce the bandwidth required to transmit a video image, a technology has developed called video compression. A typical form of video compression involves motion compensated discrete cosine transform ("MCDCT") processing (e.g., MPEG). A characteristic of this type of processing is that the resulting bit-rate of a compressed video signal varies widely over time as a function of the content of the video image. For example, one frame may require 2000 bits while the next requires only 200 bits. When the compressed video bit-stream is to be sent in real time over a bandwidth-limited telecommunications channel, such as a CDMA wireless telecommunications channel, then a bit-rate control mechanism must be employed to match the variable rate of bits produced by the encoding system to the fixed capacity of the telecommunications channel. Traditionally, this is accomplished by buffering the telecommunications channel with a FIFO, whose depth is determined in accordance with well-known queueing theory techniques.
Excessive buffering, by definition, introduces excessive temporal delay into the telecommunications channel, which is antithetical to real-time transmission. Therefore, another mechanism for bit-rate control has been developed which avoids excessive buffering. Fundamentally, this mechanism sets an upper bound on the number of bits that can constitute each compressed frame to be transmitted over the telecommunications channel. The upper bound, which is known as the "bit budget," is determined, in well-known fashion, based on the bandwidth of the telecommunications channel, statistical data on the size of compressed frames unhindered by the bit budget, the acceptable amount of delay through the telecommunications channel, and queueing theory. Each frame is then compressed, and if necessary re-compressed, until the compressed frame comprises fewer bits than the bit budget.
FIG. 4 outlines the salient steps of a class of video compression methods in the prior art that incorporate a rate-control mechanism based on a bit budget. Before the method begins, a value for the bit budget is established.
As described above, each frame in a motion picture is processed individually, one after another, and therefore at step 401, the method gets one frame to be processed.
At step 403, each frame is transformed into coefficients, in well-known fashion using, for example, the 2-dimensional discrete cosine transform ("DFT"). Sometimes the frame is transformed as a whole. More typically, however, it is computationally cumbersome to transform the entire frame as a whole and, therefore, the frame is treated as an array of blocks, which are transformed and processed individually.
At step 405, each of the transform coefficients is divided into a discrete set of values that span a useful range. The number of values, or levels, used to span this range determines the precision or resolution of the quantizer, and the size of the individual levels is known as the quantization step size. The purpose of quantizing the transform coefficients is to reduce the number of bits in the compressed image by omitting details that are less perceptible. The quantization step size affects both the fidelity of the compressed image to the original and also the number of bits in the compressed image. In fact, the quantization step size is commonly used as a parameter to trade-off the number of bits in the compressed image against fidelity, as a means of rate-control. When the quantization step size is small, the compressed image generally comprises more bits and represents an image with reasonable fidelity to the original. In contrast, when the quantization step size is larger, the compressed image generally comprises fewer bits but represents an image with less fidelity to the original. Initially, the quantization step size is set to a default value.
At step 407, each of the quantized coefficients is compressed with, for example, a lossless variable-length code, such as a Huffman code, in well-known fashion.
At step 409, the total number of bits in all of the compressed coefficients is determined, in well-known fashion.
At step 411, the method determines if the total number of bits in all of the compressed quantized coefficients is within the bit budget.
When at step 411 the bit budget is not met, control passes to step 413 and the quantized step size is increased. When the quantized step size is increased, the fidelity of the compressed image suffers, but the re-compressed image will comprise fewer bits. From step 413, control passes to step 403 and the transform coefficients are re-quantized using the new quantization step size. In general, the loop through step 411 is performed until the compressed image satisfies the bit budget.
When at step 409 the bit budget is finally met, then control passes to step 413 and the compressed image is transmitted. Each time a compressed image is transmitted with a new quantization step size, then the new quantization step size must be transmitted too so that the video decoder can know how to properly interpret the quantized coefficients in the compressed image.
When the compressed image and new quantization step size are transmitted over a lossless communications channel, the compression technique depicted in FIG. 4 is generally acceptable. In contrast, when the compressed image and quantization step size are transmitted over a lossy communications channel, such as a wireless telecommunications channel, it is possible that the quantization step size can be corrupted during transmission. When that occurs, all of the subsequently transmitted quantized coefficients will be interpreted incorrectly by the video decoder until a new quantization step size is transmitted and received correctly. The result can be a corrupted video signal that can remain corrupted for several frames or seconds.
Therefore, the need exists for a bit-rate control system that is well-suited for transmission over a lossy communications channel.
There is another disadvantage of the method depicted in FIG. 4. The iterative nature of the control loop through step 405 makes the rate at which the frames are processed dependent on the content of the frames themselves, and, therefore, only a educated guess can be made at how much computation power is needed to compress a given number of frames in a given time, or how quickly a number of frames can be compressed. Therefore, the need exists for a bit-rate control system whose computational requirements are more predictable.