There are a wide variety of signal processing applications in which the affine transformation between a suspect signal and a reference signal need to be computed accurately and efficiently. This is particularly the case for signal detection and recognition applications for images, and it applies to other types of signals as well. In the case of signal detection and signal recognition, the objective for the computing device is to determine whether a particular reference signal is present in a suspect signal. This objective is more difficult when the reference signal is present, yet is distorted by a transform of the coordinate space. In image processing, such transformations are caused by manipulation of the reference signal through image editing (magnification, shrinking, rotation, digital sampling (and re-sampling), format conversions, etc.). When the reference images or the objects they represent are captured via a camera from a different reference point relative to their original state, the result is a suspect image, which contains the reference signal, yet in a transformed state. Unless there is a means to determine and compensate for the affine transformation of the reference signal, it is more difficult to accurately detect, recognize or match the reference signal with its counterpart in the suspect image.
This signal processing problem is important to a variety of fields. Some examples include machine vision, medical imagery analysis, object and signal recognition, biometric signal analysis and matching (e.g., facial, voice, iris/retinal, fingerprint matching), surveillance applications, etc. In these applications, the objective may be to detect or match an input suspect signal with one particular reference signal, or match it with many different reference signals (such as in database searching in which a query includes a suspect signal (a probe or template) that is matched against a reference database of signals). Various types of images and sounds can be identified using signal recognition and detection techniques. These include recognition based on signal attributes that are an inherent in signals, as well as recognition based on signals particularly embedded in another signal to provide an auxiliary data carrying capacity, as in the case of machine readable codes like bar codes and digital watermarks.
In recent years, computing devices are becoming increasingly equipped with sensors of various kinds, including image and audio sensors. To give these devices the ability to interact with the world around them, they need to be able to recognize and identify signals that they capture through the sensors.
The advances of electronics have extended these advanced sensory functions beyond special purpose devices like machine vision equipment, surveillance and exploration equipment, and medical imaging tools, to consumer electronics devices, like personal computers and mobile telephone handsets. The signals captured in these devices are often distorted by transformations. If these transformations can be approximated by affine transformations or at least locally affine transformations, then it may be possible to determine the affine transformation (including local affine transform in a portion of the signal) that most closely matches the suspect with a reference signal.
The affine transformation that aligns a reference signal with its counterpart in a suspect signal can be expressed as y=Ax+b, where x and y are vectors representing the reference and transformed version of the reference signal, A is a linear transform matrix, and b is translation. The affine transformation generally comprises a linear transformation (rotation, scaling or shear) and translation (i.e. shift). The linear transformation matrix, for two dimensional signals, is a two by two matrix (2×2) of parameters that define rotation, scale and shear. The translation component is a two by one (2×1) matrix of parameters that define the horizontal and vertical shift. The translation is related to the phase shift as described in more detail below. Thus, the process of aligning two signals can include both approximations of the linear transform as well as the translation. The linear transform is sometimes approximated by determining signal correlation operations, which often employ Fourier transforms and inverse Fourier transforms. The translation component is approximated by determining phase shift (e.g., using signal correlation) in a Fourier representation.
An example of a type of transform encountered in digital image capture is a perspective transform. This type of transform is typical when a user captures an image of an object with a camera of a mobile device because the plane of the camera is often tilted relative to an image on the object's surface. For example, the image on a box or document undergoes perspective distortion when captured with a camera that is tilted relative to the surface of the box or document. Of course, the object surface is not always planar, as it may be curved (e.g., bottles, cans, jars, etc.), and it may be flexible or deformable, in which case portions of the surface are flexed in various directions. Nevertheless, the object surface may be approximated as several patches of nearly planar surfaces stitched together. The geometric deformation of the image on a patch may be an affine or perspective transform.
To illustrate mathematically, the perspective transform of original coordinates (x, y) of an image to transformed coordinates (u, v, z) is represented by the following expression:
      [                            u                                      v                                      z                      ]    =      [                            A                          B                          C                                      D                          E                          F                                      G                          H                          1                      ]  
The transformed coordinates (u′, v′) of the distorted image may be expressed as:
                                          u            ′                    =                                    u              z                        =                                          Ax                +                By                +                C                                            Gx                +                Hy                +                1                                                    ;                                      v          ′                =                              v            z                    =                                    Dx              +              Ey              +              F                                      Gx              +              Hy              +              1                                          
The perspective transform has 8 unknown parameters. The linear transform parameters are A, B, D, and E in the above expression. Translation parameters are C and F, and trapezoidal parameters are G and H. These latter parameters, G and H, are also referred to as the perspective parameters.
FIG. 16 is a diagram illustrating the effect of a perspective transform. The task of mitigating the distortive effect of a perspective transform may be managed by sub-dividing the distorted image into blocks. If the distortion vector is small, and the image block of interest is also small, then the perspective transform may be approximated with affine transform parameters. The area enclosed by solid lines (400a) on the left side of FIG. 16 is a rectangular object covered by an image. The object (400a) is rectangular, yet through image capture with a camera at a slight tilt, it is distorted by a perspective transform, resulting in distorted image 400b on the right. The dashed lines illustrate the result of sub-dividing the image into blocks. For block 402, the perspective distortion is closely approximated by differential scale. For block 404, the perspective distortion is closely approximated by shear. The perspective distortion is more closely approximated by an affine transform as the image block size decreases. The trade-off, however, is that as the block size decreases, there is less image information available to ascertain the affine transform relative to the un-distorted image. Digital image sampling and other sources of noise in the image capture process introduce further modifications of the image that complicate the design of image signal processing to recover and mitigate the impact of the perspective transform.
When signal transforms are computed in digital computing environments of general purpose processing units or special purpose digital logic circuits, a number of challenges arise. Some of these challenges include the errors caused by representing signals in discrete digital logic. Not only is quantization error introduced as analog signals are sampled through sensors, but also as these signals are re-sampled when transformed into different coordinate spaces (e.g., Fourier and inverse Fourier transforms). Additional errors are introduced in the precision or limits on precision of the circuitry used to store the discrete values of the signal and associated transform parameters. Another challenge is that signal recognition and signal alignment typically involves transforms and inverse transforms, which in addition to introducing errors, are computationally expensive to implement in hardware, require additional memory, and introduce memory bandwidth constraints as the need for read/write operations to memory increases as each value in the discrete signal is transformed, re-sampled, or approximated from neighboring sample values.
In view of these challenges, there is a need for methods to determine transforms between signals that are accurate, yet efficient to implement in digital computing environments. This includes more effective ways to estimate linear transforms as well as determining translation or phase shift.
This document details methods of computing a transformation between a discrete reference signal and an image signal using various techniques. One method provides a set of feature locations representing the discrete reference signal, and provides a seed set of initial transform parameters. The feature locations and transform parameters are represented as digital, electronic signals in an electronic memory. Using the seed set, the method finds geometric transform candidates that minimize error when the linear transforms are used to align the feature locations of the discrete reference signal and corresponding feature locations in the suspect signal. This includes computing a measure of correlation corresponding to the geometric transform candidates. The method evaluates the geometric transform candidates for each of the seeds to identify a subset of the candidates representing refined estimates of geometric transform candidates.
This document also describes various implementations of these methods. For example, one implementation is an electronic device implemented in digital logic components in an application specific integrated circuit. The device comprises a memory for storing a suspect signal representation. It includes a correlation module for receiving a seed set of geometric transform candidates and determining a correlation metric for each candidate as a measure of correlation between a reference signal and the suspect signal representation when the linear transform candidate is applied.
The device also includes a coordinate update module for determining feature locations within the suspect signal representation of a feature that corresponds to a feature of the reference signal at a location determined by applying the linear candidate transform. This module determines locations of components of a reference signal in the suspect signal and provides input to a geometric transform calculator to determine the transform between a reference signal and the suspect signal.
The device includes a geometric transform calculator for determining an updated linear transform for each of the candidates that provides a least squares fit between reference signal feature locations and the corresponding feature locations in the suspect signal determined by the coordinate update module. It uses correlation metrics to identify the most promising linear transform candidates. For example, it iterates through the process of updating the transform so long as the correlation metric shows signs of improvement in the transform's ability to align the reference and suspect signals.
Some embodiments employ a method of computing an estimate of phase of a transformed signal. This phase estimation method provides a set of feature locations representing a discrete reference signal, receives a suspect signal, and applies a transform to the reference signal to provide a set of transformed locations. It samples phase from the suspect signal at discrete sample locations in a neighborhood around the transformed locations. To these sampled phases, the method applies a point spread function to provide an estimate of phase of the suspect signal at locations corresponding to the transformed locations.
Phase estimation is implemented, for example, in a digital logic circuit comprising a memory for storing phase of a suspect signal and a transform module for transforming coordinates of a reference signal into transformed coordinate locations. The circuit also comprises a point spread function module for reading selected phase of the suspect signal from the memory at locations around a transformed coordinate location and applying a point spread function to the selected phase to provide an estimate phase.
Various embodiments employ phase estimation technology in the correlation metric and coordinate update process. For example, complex frequency components are estimated at non-integer locations employing a point spread function. These components enable more accurate measurement of correlation for a candidate geometric transform. Additionally, they enable more accurate location of coordinates for the coordinate update process.
Various embodiments apply the geometric transform to extract digital data from an image in which the reference signal is encoded. The geometric transform compensates for geometric distortion and allows for recovery of digital data message elements embedded at embedding locations. Some embodiments employ signal confidence metric based on the reference signal to weight message estimates extracted from the embedding locations.
Various embodiments employ techniques to speed the recovery of the geometric transform and reduce computational complexity of that process. One such technique employs a subset of the reference signal to identify geometric transform candidates for further refinement. Another technique, which may be employed alone, or in combination, winnows geometric transform candidates by their correlation metrics.
Some embodiments further employ a method of computing an estimate of a translation offset between a reference and suspect signal. This method operates on a set of phase estimates of a suspect signal. For each element in an array of translation offsets, the method provides a set of expected phases of the reference signal at the translation offset. It computes a phase deviation metric for each of the set of expected and corresponding phase estimates at the translation offset, and computes a sum of the phase deviation metrics at the translation offset. This approach provides a phase deviation surface corresponding to the array of translation offsets. The method determines a peak in the phase deviation metrics for the array of translation offsets (e.g., in the phase deviation surface), where a location of the peak provides the estimate of the translation offset.
This phase deviation method is implemented, for example, in a phase deviation circuit. The phase deviation circuit comprises a memory for storing a set of phase estimates of a suspect signal and known phases of a reference signal. It also comprises a phase deviation module for computing a phase deviation metric for each of the set of known phases of the reference signal and corresponding phase estimates from the reference signal for an array of translation offsets, and for computing a sum of the phase deviation metrics at the translation offsets. The circuit comprises a peak determination module for determining a peak in the phase deviation metrics for the array of translation offsets. The location of the peak provides the estimate of the translation offset between the reference and suspect signals.
The above-summarized methods are implemented in whole or in part as instructions (e.g., software or firmware for execution on one or more programmable processors), circuits, or a combination of circuits and instructions executed on programmable processors.
Further features will become apparent with reference to the following detailed description and accompanying drawings.