Many applications require the measurement of relative spatial distortions between a two-dimensional image and a collection of two-dimensional images, where at least one image in the collection is expected to be similar to the first two-dimensional image. The collection of images will often be formed of faithful representations of original data, which may be called “reference” images. The image that the collection is measured against often is, or includes, a degraded representation of at least one of the reference images; this image to be measured against may be called the “target” image.
The degradation of the target image with respect to the reference images can take many forms. As examples, the target image may be:                a printed and scanned version a reference image;        a feature which has moved in a portion of a video sequence;        obtained from a digital photograph captured under different conditions from a reference image; or        a different image with a version of a reference image embedded in the different image as a watermark.        
The terms “reference” and “target” can be interchanged for some applications, and there may not be differences in quality across all images. Sometimes the set of images may be taken from a single image, or from a series of images.
Degradation in quality can arise due to spatial distortion between the images. Estimating the distortion can provide for accommodation or correction of the distortion. The basis of many spatial distortion estimation methods is shift estimation, in which an estimate is made of the translation required of one image to best align that image with the other. Shift estimation can equally be performed between reference and target tiles extracted from the reference and target images respectively.
Several techniques have been used for such shift estimation, including correlation, phase correlation, least-squares fitting, and gradient-based shift estimation.
The correlation operation, denoted by {circle around (x)}, is a method of comparing two tiles and which calculates the summed products of corresponding pixels in two tiles at multiple shift positions:
  h  =                    f        ⊗        g            ⇔              h        ⁡                  (                      x            ,            y                    )                      =                  ∑                  i          ,          j                    ⁢                        f          ⁡                      (                          i              ,              j                        )                          ·                  g          ⁡                      (                                          x                +                i                            ,                              y                +                j                                      )                              
If there exists a position of maximum matching overlap between the two tiles, the value of the correlation will be a maximum at this position.
Correlation can be performed efficiently using the Fourier Transform, where (f) is the Fourier transform of f, −1(f) is the inverse Fourier transform, and, (g)* is the conjugate of the Fourier transform of g:f{circle around (x)}g=−1(((f)·(g)*)
Variation on correction using the Fourier Transform, such as using phase correlation or selective weighing of frequencies in the product of the Fourier transforms, can be used to achieve a sharp, easily-detectible correlation peak.
A pre-processing step for the reference and target tiles is sometimes used with correlation-based shift-estimation methods. In some correlation operations, the edges of a tile can form a very strong feature, as they can appear very similar to a strong line segment. This may result in a “false” correlation peak being formed at the origin by a match of the edges of one tile with the edges of the other, or anywhere else by a match of the edges of one tile with a line feature in the other. To mitigate this effect, the two tiles can be pre-processed using a combination of padding, band-pass filtering, and a half-Hann window, or similar, on the edge pixels. Such techniques are known in the art.
It is also necessary to choose a tile containing appropriate detail for shift estimation purposes. The tile must contain sufficient high-frequency or mid-frequency information with orientation of at least two substantially different angles. If no high-frequency information is present in the tile, the correlation peak will be broad and subject to distortion by any lighting gradient in the input tiles, and is likely to lead to inaccurate shift estimation. If the frequency content exists substantially in a single direction, then shift estimation will be inaccurate in a direction orthogonal to the tile features. If a tile contains periodic features, such as a grid of lines or a dotted line, then there may be ambiguity in a matching shift, potentially resulting in incorrect shift estimation.
If the two tiles are shifted relative to each other, and have been chosen to have appropriate detail for shift estimation, then a correlation peak will appear at a displacement from the origin directly corresponding to the shift between the two tiles.
The strength of the correlation peak, measured as its height for real-valued correlations, and the corresponding modulus for complex-values correlations, can provide a measure of confidence in the shift. A strong peak generally indicates that the two tiles contain the same or similar data, possibly shifted relative to one another. A weak peak can indicate that the two tiles do not contain related data, or that the data in one of the tiles has been degraded too far.
Applications where the target image contains data similar to an unknown one of a collection of reference images are typically only interested in determining which one reference image is represented in the target image, and the results of a shift estimation for that one image. However, determining which reference image is represented in the target image can be expensive. One approach is to perform correlation-based shift estimation between each reference image and the target image. A sufficiently high correlation peak indicates that a version of the reference image, potentially degraded, was found in the target image. The location of that peak provides the shift estimation.
Other applications have target image containing data, potentially degraded, from all reference images, and require a shift estimation for each reference image. One such application is an output checker, which is a system for ensuring that all pages emitted by a printer are free of defects. An output checker operates by comparing a printed pattern to a collection of ideal, or expected, patterns.
A way to attempt to speed up multiple correlations from a collection of reference images, where the reference images contain some similarity, is to decompose the target image and each reference image into a low-dimensional representation. A two-dimensional image of size n×m can be treated as a point in an n×m dimensional space. If the reference images are sufficiently similar, they can be approximated as a point in a lower-dimensional space. A prior art technique for performing correlations using a low-dimensional representation of images creates a set of eigenimages from the reference images (an eigenimage is an n×m dimensional orthonormal vector), and decomposes both target and reference images into a weighed sum of eigenimages plus the mean image of the reference images. Shift estimation is calculated using a correlation algorithm modified to use the eigenimage decomposition. Decomposing an image into a number of eigenimages requires projecting the original image onto each eigenimage. For some applications, this is a relatively expensive operation.
Shift estimation by correlation is also a relatively expensive operation in some applications. For applications where speed is important, performing multiple correlations to determine which reference image is represented in the target image, or measures shifts from multiple reference images where all reference images are represented in target image, may be prohibitively expensive.