Data compression methods are used to reduce the amount of data necessary to represent information. Compression is often used when data storage space, transmission bandwidth, or transmitter/receiver data rate is limited. Data is compressed to a smaller size for storage or transmission and then decompressed back to original size when needed.
Compression schemes can be classified as either “lossless” or “lossy.” In a lossless compression scheme, the data that is reconstructed at decompression is an exact match to the original data—no information is lost. In a lossy compression scheme, some information may be lost in the compression process. The goal of a lossy compression scheme is to choose the discarded information wisely, so that the data reconstructed at decompression is as close as possible to the original data, or at least so that the difference between the original and the reconstructed data is acceptable.
Video signals are a common type of data for use in compression systems. Raw video data tends to be large, so that working with raw, uncompressed video would require large amounts of storage space or transmission bandwidth. However, characteristics of typical video allow fairly aggressive compression. For instance, there is high correlation between adjacent pixels in a single video frame (the set of all picture elements that represent one complete image), since objects in video tend to be of fairly uniform color and texture. In addition, there is high correlation between pixels in the same position in adjacent video frames, since motion in video usually occurs slowly in relation to the video frame rate. These high correlations mean that video signals contain a large amount of redundant information, and these redundancies are typically exploited by compression schemes for video. In addition, most video applications do not require lossless compression—the quality constraint is simply that a human viewer perceive little or no degradation in quality after compression and decompression. The limitations and strengths of human visual perception can be taken into account when designing a lossy video compression scheme—information not perceptually significant is discarded first.
Compression schemes also can be classified as either “symmetric” or “asymmetric.” In a symmetric scheme, the compression and decompression processes are roughly equal in computational complexity. A symmetric scheme is appropriate when similar processing constraints are present at both compression and decompression points, such as in video-conferencing applications where both compression and decompression must be done in real-time. An asymmetric scheme is used when compression and decompression have different complexity constraints. Typically, the constraint on the decompression end is greater, so computations are performed by the compressor in order to lessen the computational burden on the decompressor. An asymmetric scheme is usually used for video that will be captured once and then distributed many times, such as video clips stored and made available to many users on a computer network.
Further information on typical video compression systems can be found in ITU-T Recommendation H.263 (approved February 1998); The Data Compression Book, 2nd Edition, by Mark Nelson and Jean-loup Gailly (1995); and Video Demystified, 3rd Edition, by Keith Jack (2001) (see especially chapter 3, on color spaces, the contents of which are incorporated herein by reference for all purposes).
FIG. 1 is a diagram of a typical asymmetric video compression system. Many existing video compression systems fit within this basic framework. The system consists of five main blocks—preprocessing, motion estimation, transform, quantization, and encoding—along with a feedback loop used to create decompressor reference data.
The purpose of the preprocessing block is to prepare the video data for compression. Preprocessing functions typically convert the input video data into a format that allows for easier or more aggressive compression.
One commonly used step of video preprocessing is subsampling. When video is subsampled, the size of the video frames (the number of pixels) is reduced. Subsampling is a simple way to create gains in video compression efficiency—by reducing the video frame size by half in each dimension, a 4:1 compression ratio has already been achieved. However, subsampling can result in distracting artifacts when the video is restored to full resolution after decompression.
Another commonly used step of video preprocessing is colorspace conversion. Existing raw video data is usually stored in an RGB color format (discussed below in more detail), since RGB is a convenient format for many existing displays. However, the RGB color format is not well suited to efficient compression, since the visually important video information is evenly distributed over the red, green, and blue color channels. For this reason, many video compression schemes include conversion to a different colorspace such as YUV (also discussed below). The YUV color format also contains three channels, but most of the visually important information is found in the Y channel, which contains pixel intensity information. The U and V channels contain all of the color information for the video data. Since the human eye is less sensitive to color errors than to intensity errors in typical video, the U and V channels can be compressed much more aggressively than the Y channel, with little degradation in decompressed video quality. For instance, the Y channel can be kept at full resolution while the U and V channels are subsampled by a factor of 16. This results in a similar compression ratio to the RGB subsampling by 4 (3.75:1 versus 4:1) but the quality of the resulting video is much higher because the most visually important information has been preserved.
The preprocessing block may also include other miscellaneous functions that depend on the specific design of the video compressor, such as object identification and denoising.
Prediction is used to exploit the redundancy between adjacent frames in typical video signals. Most asymmetric video compression systems contain a feedback loop including a “dummy” decompressor that mimics the state of the actual decompressor. The feedback loop provides the prediction block with copies of the previous video frame(s), and the prediction block then uses motion estimation to make a guess at what the next frame will look like. Then, rather than working with actual pixel values, the compressor will perform the remaining computations on the error between the actual frame and the predicted frame. Error values are generally smaller and sparser than pixel values, so the use of prediction reduces the amount of information that must be transmitted to the decompressor.
In addition to providing error data for further compression, the prediction block will also provide a parametric description of the estimated motion, which will be used at the decompressor to create the correct predicted frame.
Most video compression schemes include a mathematical transformation of the video data. Like the colorspace transform described above, the purpose of the mathematical transform is to organize the video data into a form more suitable for effective compression.
Two common transforms in video compression are the discrete cosine transform (DCT) and the wavelet transform. Each of these transforms organizes the video data into an “average” component and a “detail” component. The average component contains basic shape information for video frames. The detail component contains edge information, which sharpens and clarifies the video frames.
Organizing the video data into average and detail components is beneficial for compression because this organization isolates most of the energy in the video frame into a few values. For natural video, the average component tends to contain only a few values that are very important to the accurate reconstruction of the video at the output. In contrast, the detail component will contain many values that have much less impact on the video quality. The few values in the average component can be transmitted with high accuracy, while the many values in the detail component can be compressed much more aggressively.
While most transform techniques are applied to the error data as shown in FIG. 1, some systems apply the transform to incoming data and then perform motion estimation and all subsequent operations in the transform domain.
In most video compression schemes, quantization is used to increase data compression. In the quantization block, the accuracy of the video data is decreased by reducing the number of bits used to store the values. Effective use of data quantization is enhanced by the reorganization of the video data that was accomplished in the preprocessing and transform blocks; the data that is less visually important can be quantized more aggressively. Data quantization is the source of most of the information loss in a typical lossy video compression system.
The entropy encoding block in a video compressor further compresses the video data using lossless compression schemes. Common lossless compression methods for video applications are run-length encoding, Huffman encoding, arithmetic coding, or a combination of these.
FIG. 2 shows a typical decompressor corresponding to the compressor in FIG. 1. The decompressor simply reverses the operations of the compressor. First, the entropy coding, quantization, and transform are all reversed to recover the motion and error data. The motion data is applied to the previous frame, producing a prediction of the upcoming frame. Then, the error data is applied to the predicted frame to produce the output video frame. Finally, any post-processing tasks such as colorspace conversion and upsampling are completed to convert the video into the proper format for output or display.
The primary disadvantage of the prior art approach for wireless applications is its computational complexity. Even when an asymmetrical design is used, the decompressor is typically too heavy to produce acceptable video quality in real time on wireless devices that are heavily constrained in processing power and battery life.
There is thus a need for a compression/decompression method that is computationally light enough to run even on low-performance mobile devices. Prior art video compression designs are based on the assumption that the compression gain and bandwidth savings obtained from complex computations such as mathematical transform and motion estimation are worth the computational cost. However, in many wireless environments this assumption does not hold true, since the cost of reversing the transform and applying the motion data, even in an asymmetric system, makes the decompressor too heavy.
Prior art systems often attempt to produce decompressed video that is as close as possible to the original source video. However, showing well-reconstructed video on a limited display means that much of the data that is retained is not visually useful, since limitations of the display create more visual information loss than does the compression/decompression.