Spatial and Temporal Compression of Video Images
The compression of video image data, with as little loss of image quality as possible while minimizing the amount of data required to represent the image, is important for numerous applications. In particular, a high degree of compression is important for real time transmission of video images over small bandwidth communication channels. For instance, video image compression is especially useful in video conferencing applications using low bandwidth telephone connections and low bandwidth Internet connections.
Image data compression using discrete cosine transforms, wavelet transforms, and the like are well known. These are "spatial compression" techniques in that they identify redundant information with respect to the spatial characteristics of individual images. "Temporal compression" techniques are based on detecting similarities and differences between successive video image frames, and encoding the differences.
For instance, the MPEG2 video compression standard utilizes both spatial and temporal compression techniques.
In the prior art, temporal compression techniques have been based on two primary techniques. In the first technique, pixel locations in a video frame are compared with the same pixel locations in a prior video frame to generate a differential video frame. A full video frame is encoded either every N frames, or whenever the differences between a frame and the last full video frame exceed a defined threshold. Other video frames are encoded as differential frames. The second technique is similar to the first, except that an attempt is made to identify the most similar region of a prior video frame for each region of a current video frame. This second technique provides better results that the first technique when portions of an video image are moving from one position to another, because portions of the prior video frame accurately represent the current video frame when they are translated by an offset position. The differential video frames are represented by sub-frame position changes and differences between each sub-frame and the best matching sub-frame in the earlier video frame, which are then encoded.
A common theme in the prior art temporal compression techniques is that a set of differential image data is generated by comparing a current video frame with a prior video frame, and then the resulting differential image is compressed and encoded using spatial compression and encoding techniques. Another aspect of the prior art is that these techniques commonly replace blocks in a current video frame with blocks in a prior frame causing the resulting image to exhibit "block artifacts," despite the use of differential data to adjust the prior frame blocks to the current frame.
In addition to the block artifact problem, another problem noticed by the present inventors with the prior art temporal compression techniques is that they deal poorly with changes in lighting from one video frame to the next. For various reasons, the lighting levels in many video images are constantly changing, even if only a little. As a result of these constant lighting variations, large portions of the video data frames undergo small changes from frame to frame, requiring all those changes to be encoded, while in fact the only change in the information content of the video image has been a small change in background lighting.
From another viewpoint, the problem noticed by the inventors is that when the primary changes between video frames have low or very low spatial frequency, the prior art temporal compression techniques fair poorly.
A primary goal of the present invention is to provide a technique that much more efficiently encodes sequences of video frames, and in particular provides significantly enhanced temporal data compression by using a technique that minimizes the amount of data required to represent changes having low spatial frequency, such as small changes in background lighting. Another goal of the present invention is to also provide a technique that efficiently encodes changes in position of a small portion of a video image, while at the same time efficiently encoding changes having low spatial frequency.