Compression of a digital signal reduces the bandwidth needed for signal storage or transmission. For example, a high definition television ("HDTV") signal can require as much as 1 billion bits per second. By reducing the amount of data by as much as a factor of fifty (e.g., to 20 million bits), present day compression techniques facilitate compact storage and real-time transmission of complex signals. Today, compression techniques are commonly applied to transmit video signals through satellite transmission and cable, and they enable storage of video on compact disk or in computer memory.
Well known compression techniques include "JPEG," "MPEG," "MPEG-2," "MPEG-4," "H.261," and "H.263." A primary goal of most of these compression techniques is to take an input stream of full length video or audio, determine redundancies that exist in the signal, and encode those redundancies such that the input signal is compressed to be shorter in length. Compression can be used to eliminate spatial and temporal redundancies; as examples, pixel values in a region of an image frame may be converted into information indicating that the region can be reproduced based upon another part of the same image frame or of a previous image frame, respectively.
Compression algorithms generally rely on block-based (e.g., tile-based or object-based) encoding, which is introduced with reference to FIG. 1. FIG. 1 shows two image frames 11 and 13. The second of these image frames 13, is divided into a number of square tiles 15, and it is desired to compress the second frame so that relatively less data is used for image representation. In typical image compression, pixels for each tile 15 will be separately compressed to remove either spatial redundancies within the same frame (the second frame 13) or temporal redundancies between frames (e.g., by comparison to the first frame 11). In this example, it is to be assumed that the second frame will be compressed only to remove temporal redundancies between frames, but similar principles can be applied to reduce spatial redundancies within the same frame.
To this effect, a digital processing device compares pixels in each tile in the second frame with image pixels found near the same tile location 21 within the first image frame 11. This comparison is indicated by a reference tile 17 and an arrow 19 which points to the same tile location 21 in the first image frame. The digital processing device compares pixels from the reference tile 17 with pixel subsets of a fixed "search window" 23 to determine a "closest match." The "closest match" in FIG. 1 is indicated by a bold square 25, which is illustrated as slightly offset from position of the tile 21. With the "closest match" having been found, the digital processing device calculates a motion vector 27 and a set of pixel value differences called "residuals." The "search window" for each tile defines a maximum set of boundaries beyond which searching will not be performed for the "closest match."
FIG. 2 is used to illustrate searching according to this criteria. In particular, a portion of the first image frame 11 from FIG. 1 is observed as including many pixels, exemplified by the reference numeral 35. A "pixel" is the smallest element of a picture consisting only of a single color and intensity. Although an image frame will consist of hundreds of tiles in both of X and Y directions, FIG. 2 shows five columns 31 and three rows 33 of tiles, where each tile is illustrated to have eight rows and eight columns of pixels. Searching for the "closest match" of a data block, such as the block represented by the numeral 17 in FIG. 1, can be conventionally performed within a fixed search window 23 about an expected location 21 of the closest match. Each square subset of 64 contiguous pixels is sequentially compared to the data block (seen in FIG. 1 and designated by the reference numeral 17) and the "closest match" is the particular subset which differs least from the data block. A motion vector identifies the location of the "closest match" with respect to the expected location, and the associated residuals contain pixel by pixel differences also arranged in a square tile.
The motion vector and residuals are then encoded in a compact manner, usually through "run-length coding," "quantization" and "Huffman coding." The digital processing device repeats this process for each tile 15 until the entire second image frame 13 is compressed. During later decompression, for example, at a network, television station, editing facility, or at an end-viewer's computer or television, the second frame is completely recalculated from an already-decompressed first frame by reconstructing each tile using motion vectors and residuals. The various standards mentioned above generally operate in this manner, although some new standards call for subdividing images into variable size objects instead of tiles (the principles are, however, similar).
The known search methods work well for their intended purposes, but can be quite time intensive. For example, as much as seventy percent (70%) of resources used by a digital processing device to compress an image is applied to searching for the "closest matches" and associated motion vectors. Practically speaking, it is extremely difficult to compress image sequences in real-time. Taking the HDTV example indicated above, compressing many millions of bits per second is difficult even for today's multi-megahertz computers.
Some solutions have attempted to cut down on a full search within a search window. For example, one proposal calls for sampling the search window, e.g., comparison is made using subsets of data incremented by four pixel columns or four pixel rows each iteration in order to find the closest match. Another proposal calls for sampling the fixed size search window in order to guess the approximate location of the "closest match" and then to focus search upon that approximate location. In this latter proposal, three search points within the window are selected and comparisons using these points are made with the tile under consideration; the point yielding the smallest residuals is used to pick three new points, and after several iterations the "closest match" is selected. While generally useful in reducing the amount of searching required, these solutions potentially still require extensive searching within a potentially large search window.
A need exists for a system that can more quickly compress digital signals, particularly video signals. Ideally, such a system would operate in a manner compatible with existing object-based and block-based standards. Further still, such a system ideally should be implemented in software, so as to improve the speed at which existing machines process video and facilitate applications of real-time compression. The present invention satisfies these needs and provides further, related advantages.