In many application environments, an image sequence is encoded (or compressed) to reduce the total amount of data needed to represent the image sequence. The compressed data may then be stored or transmitted more efficiently than the original uncompressed image sequence data. The image sequence may be any sequence of images, including a sequence of video image frames and a sequence of still images. Multiple view image sequences are sequences of images corresponding to different views of a scene; the images may be captured by a single camera positioned at different viewpoints, or the images may be captured by multiple cameras positioned at different locations relative to the scene to capture the scene from different viewpoints.
Image compression methods typically fall into one or more of three main image compression classes: spectral redundancy reduction, spatial redundancy reduction, and temporal redundancy reduction. Spectral redundancy reduction methods typically reduce the amount of image data by discarding spectral data that are not strongly perceived by human eyes. Spatial redundancy reduction methods reduce higher spatial frequency components in the original image data. For example, transform coding is a common spatial redundancy compression method that involves representing an image by a set of transform coefficients. The transform coefficients are quantized individually to reduce the amount of data that is needed to represent the image. A representation of the original image is generated by applying an inverse transform to the transform coefficients. Temporal redundancy reduction methods compress a sequence of images by taking advantage of similarities between successive images. Temporal redundancy may be reduced, for example, by transmitting only those movements or changes in a given image that permit accurate reconstruction of the given image from another image (e.g., a previously received video image frame).
Various different standards of image sequence compression have been developed, often based on block-matching methods. Block-matching methods initially divide a target image (or frame in the case of video image data) to be compressed into an array of blocks (or tiles). Motion data and motion compensation difference data are generated for each block based on a set of data in a reference image (e.g., in a prior video frame) that is similar to the block. In a typical approach, the target image is completely divided into contiguous blocks and sets of pixels in the reference image that best match each block are identified. The target image is reconstructed by accessing and manipulating portions of the reference image. The motion data represents an amount of movement that repositions a suitable part of the reference image to reconstruct a given block of the target image, and the motion-compensated difference data represents intensity adjustments that are made to individual pixels within the set of data from the reference image to accurately reproduce the given block of the target image.
Various methods for computing motion vectors between blocks of a target image and corresponding blocks of a reference image have been proposed. In a typical block matching approach, a current block is compared with all the blocks of like size in a search window superimposed on the reference image. Typically, image blocks of the target image and the reference image are compared by calculating an error function value for each possible match. The motion vector with the smallest error function value is selected as the best matching motion vector for a given target image block. Exemplary block matching error functions are the sum of the absolute values of the differences of the pixels between matched blocks and the sum of the squares of the differences. Motion estimation typically requires a significant portion of the computational resources needed to implement any given image sequence compression method.