Many applications require the measurement of relative spatial distortions between a pair of similar two-dimensional images. Each pair will often consist of a faithful representation of an original, which may be called a “reference” image, and a degraded representation of this original, which may be called the “target” image.
The degradation of the target image with respect to the reference image can take many forms. As examples, the target image may be a printed and scanned version of the reference image; it may be a feature which has moved in a portion of a video sequence; or it may be obtained from a digital photograph captured under different conditions from the reference image.
The terms “reference” and “target” can be interchanged for some applications, and there may not be differences in quality between the two images. Sometimes they may be taken from a single image, or from a series of images.
Degradation in quality can arise due to spatial distortion between the two images. Estimating the distortion can provide for accommodation or correction of the distortion. The basis of many spatial distortion estimation methods is shift estimation, in which an estimate is made of the translation required of one image to best align it with the other.
Where spatial distortion is not just a simple translation, shift estimation can be used to determine translation between pairs of reference and target tiles (also referred to as patches) extracted from the reference and target images respectively. A collection of shift estimates from multiple locations in the images can be used to determine more complicated spatial distortions such as rotation, scaling, arbitrary affine distortion, perspective distortion, and non-linear distortion, such as barrel distortion.
The measurement of spatial distortion between tiles supports many applications. Such a measurement can be used for the direct measurement of alignment quality in printing and/or scanning devices; for assisting in the adjustment of the printing or scanning devices; or as a pre-processing step to allow accurate alignment of two images, one directly over the other, to allow direct comparison of a printed copy with its original. This comparison can allow evaluation of the quality of printing, and also verification that the printed output is free from defects. Such measurement can also provide information about the physical state of an imaging system, where a rotation angle, or a scale factor, represent a proxy for another physical variables in the system.
Several techniques have been used for such shift estimation, including correlation, phase correlation, least-squares fitting, and gradient-based shift estimation (GBSE).
The correlation operation, denoted by , is a method of comparing two tiles and which calculates the summed products of corresponding pixels in two tiles at multiple shift positions:
  h  =                    f        ⊗        g            ⇔              h        ⁡                  (                      x            ,            y                    )                      =                  ∑                  i          ,          j                    ⁢                          ⁢                        f          ⁡                      (                          i              ,              j                        )                          ·                  g          ⁡                      (                                          x                +                i                            ,                              y                +                j                                      )                              
If there exists a position of maximum matching overlap between the two tiles, the value of the correlation will be a maximum at this position.
Correlation can be performed efficiently using the Fourier Transform, where (f) is the Fourier transform of f, −1(f) is the inverse Fourier transform, and, (g)* is the conjugate of the Fourier transform of g:fg=−1(((f)·(g)*)
To achieve a sharp, easily-detectible correlation peak, phase correlation, in which each term in the correlation product is replaced by a complex value with unit modulus, can be used, and is denoted by :
      (          f      ⁢              ⊗        ^            ⁢      g        )    =            ℱ              -        1              (                            (                                    ℱ              ⁡                              (                f                )                                      ·                                          ℱ                ⁡                                  (                  g                  )                                            *                                )                ⁢                  (                      u            ,            v                    )                                                          (                                          ℱ                ⁡                                  (                  f                  )                                            ·                                                ℱ                  ⁡                                      (                    g                    )                                                  *                                      )                    ⁢                      (                          u              ,              v                        )                                        )  
A pre-processing step for the reference and target tile is necessary to assist with phase correlation as a shift-estimation method. In a phase correlation operation, the edges of a tile can form a very strong feature, as they can appear very similar to a strong line segment. This may result in a “false” correlation peak being formed at the origin by a match of the edges of one tile with the edges of the other, or anywhere else by a match of the edges of one tile with a line feature in the other. To mitigate this effect, the two tiles can be pre-processed using a combination of padding, high-pass filtering, and a half-Hann window, or similar, on the edge pixels. Such techniques are known in the art. Applying a window, such as a half Hann window, to pixels near the tile edges is sometimes referred to in this disclosure as hedging.
It is also necessary to choose a tile containing appropriate detail for shift estimation purposes. The tile must contain sufficient high-frequency or mid-frequency information with orientation of at least two substantially different angles. If no high-frequency information is present in the tile, the correlation peak will be broad and subject to distortion by any lighting gradient in the input tiles, and is likely to lead to inaccurate shift estimation. If the frequency content exists substantially in a single direction, then shift estimation will be inaccurate in a direction orthogonal to the tile features. If a tile contains periodic features, such as a grid of lines or a dotted line, then there may be ambiguity in a matching shift, possibly resulting in incorrect shift estimation also.
If the two tiles are shifted relative to each other, and have been chosen to have appropriate detail for shift estimation, then a correlation peak will appear at a displacement from the origin directly corresponding to the shift between the two tiles.
By measuring tiles of corresponding portions of the reference and the target image in several places, a more complicated global transform than just shift can be measured.
For example, by measuring relative shift at three non-collinear positions, an affine transform can be derived relating the two tiles. By measuring relative shift at many points, a least-squares fit to some global transformation may be derived, or a complete “warp map” estimated for one or both images which can relate every pixel in one image to a pixel position in the other.
In many applications, it is common for there to be a small relative rotation between the two tiles being measured. For example, the rotation may be caused by a rotation of a page on a scanner platen, or the rotation of paper inside the feed path during printing, or by a small misalignment of a camera.
The introduction of a small relative rotation between two tiles can have a marked and deleterious effect on shift estimation. For example, for a 128×128 tile, a measured accuracy of phase correlation is reduced from an error of below 0.02 pixels at 0°, to an error of 0.15 pixels at 1° and almost a whole pixel at 2°.
Aside from the effect of a small rotation on individual tiles, a small rotation can cause the local shift between two images to change markedly across the whole image. For example, a 600 dpi scan of a piece of Letter-sized paper of 8.5×11 inches has dimensions of approximately 5,100×6,600 pixels. A 1° relative rotation between the reference and the scanned target image will result in a local shift of more than 100 pixels difference between the top and the bottom of the page. Such a shift will substantially reduce the common overlap between two tiles of size 128×128 and hence greatly reduce shift estimation accuracy for these tiles.
The two negative effects of rotation on shift estimation, namely decreasing accuracy and larger shift, both combine to greatly reduce shift estimation accuracy, to the point where shift estimation is likely to fail completely.
Such effects require that several, potentially inaccurate, measurements must be taken at corresponding positions of the reference and target images before a global transform can be derived. Because of the potential unknown shift between the reference and target tiles, it is also necessary to extract tiles of size much larger than is necessary to estimate shift in exactly corresponding tiles in order to ensure that there will be sufficient overlapping in relatively shifted tiles.
A spatially varying blur has been proposed to reduce the effect of rotation and scaling on correlation. However, such a blur will remove almost all high frequencies in a reference tile except at the very centre of the tile. As it is the high frequencies in a tile which provide the sharpness of a correlation peak, this operation can also greatly reduce shift estimation accuracy.
Correction measures have been proposed for the GBSE method which afford some robustness against rotation and scaling. However, such measures are only effective where the largest shift in tiles being measured is less than a single pixel. Under a rotation of 2°, the difference in shift from the top to the bottom of a tile is 2 pixels for a 64×64 tile, and 4.5 pixels for a 128×128 tile.
Two simple techniques can reduce the effect of rotation on shift estimation methods. Using a smaller tile can reduce the effect of rotation on shift estimation, but will also reduce the maximum shift that can be estimated. Down-sampling both tiles before measurement can also reduce the effect of rotation on shift estimation, but will also reduce accuracy, as measured shifts will be only a fraction of the true shift.
Note that relative image scaling has a similar effect to that of rotation on shift estimation. However, a relative scaling can usually be discounted in many applications, because the scaling factor of optical and print devices is known prior to the commencement of processing and so can be corrected automatically, if it is necessary to do so.
A least-squares fit can model and measure rotation, scaling and/or shift for pairs of tiles, but such methods usually involve an iterative search, and can be very slow.
Fourier techniques involving log-polar transforms and multiple correlations can also be used to estimate rotation, scaling and translation over a very wide range, but they are also very slow.
While it is possible to achieve some robustness to rotation with the above methods, this robustness is generally at the expense of measurement accuracy, and these methods do not provide a way to directly estimate rotation angle or scaling using only a single measurement.