A computer processes audio or video information as a series of numbers representing that information. The larger the range of possible values for the numbers, the higher the quality of the information. On the other hand, the larger the range of values, the higher the bitrate cost for the information. Table 1 shows ranges of values for several types of audio or video information of different quality levels, along with corresponding bitrate costs.
TABLE 1Ranges of values and cost per value fordifferent quality audio or video informationInformationtype and qualityRange of valuesCostaudio sequence, voice0–255per sample 8 bits (1 byte) qualityaudio sequence, CD0–65,535per sample16 bits (2 bytes)qualityvideo image, black and0–1per pixel 1 bit     whitevideo image, gray scale0–255per pixel 8 bits (1 byte) video image, “true”0–16,777,215per pixel24 bits (3 bytes)color
Aside from the range of values, the quantity of samples or pixels also affects the quality of the representation. A video frame with 320×240 pixels looks crisper than a lower resolution, 160×120 video frame. Video at 30 frames per second looks smoother than video at 7.5 frames per second. Again, however, the tradeoff for high quality is the cost of storing and transmitting the information. A 1 second video sequence with true color pixels, 320×240 frames, and 30 frames per second consumes 6,912,000 bytes—a bitrate of 55,296,000 bits per second. In comparison, a 1 second video sequence with gray scale pixels, 160×120 frames, and 7.5 frames per second consumes 144,000 bytes—a bitrate of 1,152,000 bits per second.
Audio and video information have high bitrate, and storing and transmitting the information is costly. Compression decreases the cost of storing and transmitting the information. Two categories of compression are lossless compression and lossy compression.
Lossless compression reduces the bitrate of information by removing redundancy from the information. For example, a series of ten identical pixels can be represented as the color of the pixels and the number ten. Lossless compression techniques reduce bitrate at no cost to quality, but can only reduce bitrate up to a certain point.
In contrast, lossy compression techniques reduce bitrate by any amount, but quality suffers and the lost quality cannot be restored. To maximize perceptual quality, lossy compression techniques seek to preserve perceptually important information while removing information less important to perceptual quality. Thus, an audio encoder removes portions of an audio signal that would not be heard by a human listener, or a video encoder blurs a video frame in a way that would not be noticeable to a human viewer. Conventional lossy compression techniques for video include quantization and frame dropping. In general, quantization changes the range of values used to represent pixels, while frame dropping eliminates frames or reduces frame rate.
Filtering is a technique commonly used to remove or suppress “salt and pepper” static or other noise in information. Filtering can also be used in video compression. For more information, see U.S. Pat. No. 5,787,203 to Lee et al., “Method and System for Filtering Compressed Video Images,” issued Jul. 28, 1998, and Roosmalen et al., “Noise Reduction of Image Sequences as Preprocessing for MPEG2 Encoding,” Proceedings of Eusipco (1998).
Median filtering is one type of filtering. Applied to a video frame, median filtering replaces each pixel in the video frame with the median of the neighboring values in a kernel around the pixel. Other terms for the kernel include window, neighborhood, mask, filter, filter operator, or filter shape. In FIG. 1, a 4×4 block (110) of gray scale pixels is median filtered with a five-value cross-shaped kernel (120), producing a 4×4 block (130) of filtered output. The kernel (120) is shown filtering the upper, leftmost pixel [195] of the block (110), and two values in the neighborhood of the pixel but outside of the block (110) are not considered. The values in the kernel (120) are sorted [16, 16, 195] and the middle value [16] is taken as the value of the pixel in the block (130) of filtered output. If the neighborhood contains an even number of values, the average of the two middle values can be taken. There are other conventions for handling edge values (e.g., replicating edge values to fill a kernel) and other shapes and sizes for the kernel (120).
Within a sequence of audio or video information, periods with rapid change (such as high motion video) or high detail have less redundancy to exploit than relatively constant, uniform periods. As a result, the information naturally compresses to a variable bitrate sequence.
In contrast, digital phone lines, videoconferencing connections, and many other transmission media offer constant bitrate for delivery of information. Although bandwidth fluctuates on the Internet, audio or video information sent over the Internet is typically compressed to a relatively constant bitrate that targets the average available bitrate for a connection.
To deliver video information at a relatively constant bitrate, conventional video encoders use bitrate adaptive quantization or bitrate adaptive frame dropping. Bitrate adaptive quantization and frame dropping cause a direct and immediate change in bitrate for a video frame. With bitrate adaptive quantization, quantization is increased so as to decrease bitrate, or quantization is decreased so as to increase bitrate. With bitrate adaptive frame dropping, video frames are dropped to immediately decrease bitrate.
While conventional bitrate adaptive compression techniques control bitrate, the quality of the compressed information dramatically and noticeably changes when an adjustment occurs. Frame dropping causes a “stutter” effect, and increasing quantization often causes visible blocking or ringing artifacts. Thus, the perceptual quality of the compressed information is not as good as it could be for the bitrate.