It takes an enormous amount of bandwidth to process, store or transmit high-resolution, full-color digital video data at acceptable frame rates. For example, assuming a video source with a resolution of 320 by 240 pixels (corresponding to VGA resolution) and 16-bit pixel data, one frame of color video data takes up 1,228.8 Kbits (320.times.240.times.16). Assuming the video source provides 12 frames per second at this resolution, systems that display or store the video data must have a bandwidth of at least 14.746 Mbits per second. Common transmission systems (telephone lines and 28.8 Kbit per second modems) and storage media (hard disk drives with typical real-world storage rates of no more than 10 Mbits per second) do not have this capability. Consequently, there is a need for systems that compress high-resolution video data so that it can be stored and transmitted more easily.
There are two classes of video compression systems. Systems that perform temporal compression shrink the amount of video data by detecting similarities between corresponding pixels in subsequent video frames and encoding the redundant information so it takes up less space on transmission or storage. In contrast, systems that solely perform spatial compression reduce the amount of data needed to represent a single frame of video by detecting regions within a frame with similar pixel data and compressing the video data corresponding to those regions. Because the present application is directed to temporal compression, the remainder of this section focuses predominantly on some problems with prior art temporal compression systems.
Many prior temporal video compression systems are built on an assumption that all frame updates are to be encoded at the same compression/quality level. Typically, this quality level is specified by a user. For example, a user might specify that the data stream from a video camera always be compressed to a high quality level so fine details of each video frame can be preserved. Upon receiving a new frame of video data, the temporal compression system determines which parts of the new frame are sufficiently different from corresponding parts of the previous frame. The temporal compressor then compresses to the required level the video data from each part of the frame that is to be updated.
This approach to temporal compression can cause two problems. First, if there is a large degree of change from one frame to the next (e.g., where the entire frame needs updating), at the predetermined compression level there might not be enough bandwidth to transmit the compressed data for the parts of the frame that need updating. Second, if there are only small differences between subsequent frames, or if the transmission channel has a large bandwidth, the available transmission bandwidth might not be fully utilized for the update, meaning that the update could have been transmitted at a higher quality level than the predetermined quality level or a lower compression ratio. Therefore, there is a need for a temporal compression system that can adaptively adjust its quality settings so that each frame update fits within the available transmission bandwidth and is also transmitted at the best possible quality.
In addition to determining which parts of a frame are to be updated based on temporal comparisons between subsequent frames, it is well known to spatially compress the updating information so that a frame update takes up less bandwidth (thereby allowing more of a frame to be updated or allowing a higher quality update) than the purely temporally compressed update. Prior art spatial compression techniques can be used for this purpose, but they are not satisfactory for a number of reasons which are described in the co-pending application and are briefly summarized below.
One problem with the prior compression systems (especially systems that make use of RLE, or run-length encoding) is that for random images (i.e., images with little or no correlation between neighboring pixel elements), they are likely to produce a "compressed" video stream that is actually longer than the original video stream. This is the case because the lack of similarity between random neighboring elements causes the RLE systems to generate uniformly long codes for all of the elements. This can even occur in RLE systems for real-world data. Even if the prior art spatial compression systems did not possess this problem, they would still be too slow to compress and then decode highly detailed video data at acceptable frame rates. This is because the encoded data provided by the prior art compression systems is structured so that it can only be decoded bit-by-bit or in time-consuming pattern matching (in the RLE case).
In contrast, the co-pending application describes a color video spatial compression system that is guaranteed to produce compressed data streams that are no bigger than the initial video stream for all possible images. Also, the spatial compression technique described in the co-pending application decodes the compressed video in multi-bit chunks and, as a result, is fast enough to provide the necessary decompression bandwidth for high quality images.
Therefore, there is a need for a temporal compression system that can make use of the spatial compression techniques described in the co-pending application to compress the updating information for each video frame.