For a variety of image processing applications, it is necessary to determine the geometric distortion of an image, and, compensate for it. Technical fields where this is important include image and object recognition. Another application is decoding machine readable data encoded into an optical data carrier within an image. This data carrier may be an overt optical code such as a two-dimensional (2D) barcode or an imperceptible signal incorporated into the image. In the latter case, the data carrier is incorporated into an image to meet image quality, data carrying capacity and signal robustness criteria. Digital watermarking is an example of enhancing an image to embed auxiliary data.
Compensating for geometric distortion is necessary in these applications to extend the range over which recognition and data decoding provide reliable results (referred to as the operational envelope).
One approach for determining geometric distortion employs signal structure within the image. The geometric transform is determined by deriving geometric transform parameters of the signal structure in a distorted image. The signal structure may be pre-determined and inserted within images. Alternatively, it may be derived from an arbitrary image and the derived image structure (e.g., a feature vector of spatial features) stored in a database as a reference signal for later use in matching stored reference structures with corresponding structure derived from a suspect image. In some applications, the signal structure is a hybrid of an auxiliary image structure and inherent signal structure already within a target image. Regardless of how the signal structure forms part of image, the objective of the image processing method is to ascertain the geometric transform of that signal structure efficiently and accurately. The method must be efficient because processing resources, battery power, memory and memory bandwidth are constrained for practical applications in mobile devices and automated data capture devices, such as fixed and handheld optical code scanners. Moreover, even in cloud side applications where processing is more plentiful, image recognition and data extraction need to be efficient and have a broader operational envelope to handle noisy and distorted imagery.
A suspect image may not contain expected image structure, and as such, processing expended trying to detect it is a waste of processing resources. Thus, it is advantageous that the image processing method not waste resources on futile operations. The method should enable the host system to converge rapidly to a reliable recognition result or reject image blocks that are unlikely to lead to a reliable result.
Moreover, many applications require real time or low latency performance, as the image processing task must operate on a real time, incoming stream of image blocks, and there are strict time and hardware resource constraints on the amount of time and hardware allocated to each block. Examples where these constraints are prevalent include a battery powered mobile device and an automatic data capture device (e.g., barcode scanner) operating on an input stream of frames captured by its digital camera.
One driver of low latency operation is to provide an acceptable user experience. The geometric distortion must be detected within a limited period of time as the user is capturing image frames of an object so that responsive actions may be triggered (e.g., fetching of object information and augmenting a virtual reality display of a live video stream). Another driver is the limit of the hardware to retain and analyze frames from a live input stream of frames being captured of an object. A limited number of frames may be buffered and analyzed before the buffers and processing logic are assigned to new frames being captured by a camera.
Images incur geometric distortion in a variety of ways. The technology of this disclosure is concerned with determining and compensating for geometric distortion that occurs to an image relative to its original state. In its original state, its structure is known, either because it has been generated to incorporate a particular structure or the structure has been derived from its inherent features. These properties may be spatial or transform domain features (e.g., spatial frequency or autocorrelation domain) like peaks (local maxima or minima), corners, edges, etc. From this initial state, the image is geometrically distorted when it is rendered to a display or marked on a substrate (e.g., paper or plastic of a product package or label). The image is further distorted, for example, when the object to which it is applied or displayed is distorted. Displayed images are distorted to fit a particular display device. When a package substrate material, such as a plastic or paper based substrate, is formed into a package, the image is distorted into the shape of the object. During use of the object, the image is further distorted (e.g., non-rigid objects are readily deformable during normal use, including when being imaged). Then, when the image is captured digitally, by an imager in a mobile device (e.g., smartphone, tablet) or automatic data capture equipment (e.g., fixed or handheld barcode scanner), it is distorted further. In light of these various sources of geometric distortion and image noise, it is challenging to determine the geometric transform of a suspect image relative to its original state.
FIGS. 1-4 illustrate aspects of the geometric distortion problem with a simplified depiction of an image scanner capturing an image of a package 10. The plane shown as line 12 from this side view corresponds to the glass surface of a flatbed scanner. To introduce baseline concepts of camera and package tilt in one dimension, we depict it as virtual scanner glass 12 in FIGS. 1-4. Actual geometric distortion tends to be more complex, with tilt and camera angle in different directions, finite focal length(s) of the camera, etc.
FIG. 1 depicts the case where the camera angle is zero degrees, the package is flat, and the camera is assumed to have infinite focal length. Through a camera lens 14, the camera in the scanner captures an image shown at line 16. In this case, the captured image has its X coordinates multiplied by 1, reflecting that no geometric distortion is introduced. For this example, we illustrate distortion in one axis, X, of the spatial coordinate space. Similar distortion occurs in other axes.
In FIG. 2, the package 10 is tilted on the virtual scanner glass by an angle Δ. In this case, the captured image has its X coordinates multiplied by cos Δ due to the tilt of the package.
In FIG. 3, the package 10 has no tilt but the camera angle is α. In this case, the captured image has X coordinates multiplied by cos α. In some capture devices, the camera angle relative to the scanner surface is known, such as in flatbed scanners. In other devices, it is not. If the camera angle is known, image pre-processing can potentially compensate for it by dividing the image coordinates by cos α. However, this pre-processing may introduce additional noise into the image, even if it is slightly incorrect.
FIG. 4 illustrates the case where the camera angle is α, and the package is tilted by angle Δ. In this case, the captured image has X coordinates multiplied by cos(α+Δ). With a correction for the camera angle, the distortion is:
  distoration  =            cos      ⁡              (                  α          +          Δ                )                    cos      ⁡              (        α        )            
The optimal value of this function is 1. Otherwise, the image gets squished or stretched in a direction due to differential scale and sheer effects. FIG. 5 is a plot of the distortion for a fixed package angle Δ=10. As the camera angle increases, the distortion increases and becomes increasingly difficult to correct accurately. Further, in practice, additional geometric distortion, such as perspective distortion, is present, which is more challenging to compensate for in applications of image recognition and decoding machine readable data encoded in the distorted image.
In previous work, we have developed techniques for determining geometric transform parameters using log polar and least squares methods. Please see, in particular, U.S. Pat. Nos. 6,614,914, 7,152,021, 9,182,778, and U.S. patent application Ser. No. 14/724,729 (entitled DIFFERENTIAL MODULATION FOR ROBUST SIGNALING AND SYNCHRONIZATION)(now published as US Application Publication No. 20160217547), which describe various methods for determining geometric transformations of images. International Patent Application WO 2017/011801, entitled Signal Processors and Methods for Estimating Geometric Transformations of Images for Digital Data Extraction, provides additional disclosure, expanding on the technology in U.S. Pat. No. 9,182,778. In particular, WO 2017/011801 provides additional disclosure relating to the challenge of perspective distortion, including techniques for approximating perspective distortion with affine transform parameters. U.S. Pat. Nos. 6,614,914, 7,152,021, 9,182,778, US Publication 20160217547, and WO 2017/011801, are hereby incorporated by reference. See also Ser. No. 14/842,575, entitled HARDWARE-ADAPTABLE WATERMARK SYSTEMS (now published as US Application Publication No 20170004597), for more on implementation in various hardware configurations, which is hereby incorporated by reference.
While it is possible to approximate a perspective transform with an affine transform, an affine transform is not a perfect approximation. The focal length in scanner cameras is not infinity. To illustrate the point, a general perspective transformation can be described by the following homography matrix:
  H  =      (                                        a            11                                                a            12                                                a            13                                                            a            21                                                a            22                                                a            23                                                            a            31                                                a            32                                    1                      )  
The affine part of this matrix corresponds to parameters: a11, a12, a21, a22, and the purely perspective part of the matrix correspond to parameters: a31, a32. The translation part corresponds to parameters a13 and a23. Recovery of the affine parameters may approximate a perspective distortion, but this approximation is not always sufficient and some amount of correction for the perspective part is sometimes necessary.
In one approach, a direct least squares method is used to recover affine parameters and additional corrections are applied to correct the rest of the parameters (pure perspective and translation).
If designed properly, these various methods can provide an effective way to estimate geometric transform parameters. However, they can tend to consume significant computational resources or not sufficiently address certain forms of distortion, such as perspective. In this document, we describe methods that extend the operational envelope with improved efficiency and accuracy.
One aspect of the invention is a method of determining a geometric transform of an image. The method comprises:
obtaining a suspect image;
transforming the suspect image into an image feature space;
for plural geometric transform candidates, determining new geometric transform candidates by acts of:
a) obtaining transformed coordinates of reference signal components, the transformed coordinates having been geometrically transformed by a geometric transform candidate;
b) for the reference signal components, determining updated coordinates by locating an image feature in a neighborhood in the suspect image around the transformed coordinates of a reference signal component, the image feature corresponding to a potential reference signal component in the suspect image;
c) determining a new geometric transform that provides a least squares mapping between coordinates of the reference signal components and the updated coordinates; the new geometric transform parameters being computed by dot product operations on the coordinates of the reference signal components and the updated coordinates; and
d) from the dot product operations, obtaining a least squares error metric for the new geometric transform candidate;
for the new geometric transform candidates, comparing the least squares error metric for the new geometric transform candidate to a threshold;
discarding new geometric transform candidates having a least squares error metric exceeding the threshold; and
refining new geometric transform candidates having a least squares error metric that does not exceed the threshold.
This method is implemented in instructions executed on one or more programmable processor units, or in alternative digital logic circuitry, as detailed further below.
Another aspect of the invention is an image processing device comprising:
a memory in which is stored a suspect image in an image feature space;
a first buffer;
a second buffer;
a processor system comprising a first processing unit and a vector processing unit;
the first processing unit configured to load reference signal coordinates of reference signal components into the first buffer, to obtain transformed reference signal coordinates for a geometric transform candidate, and for each of the transformed reference signal coordinates, locate updated coordinates of a potential reference signal component in the suspect image in neighborhoods around the transformed reference signal coordinates, and configured to load the updated coordinates into the second buffer;
the vector processing unit configured to obtain a vector of the reference signal coordinates from the first buffer and obtain a corresponding vector of updated coordinates from the second buffer and execute dot product operations on the vectors to determine a new geometric transform that provides a least squares mapping between the reference signal coordinates and updated coordinates, the vector processing unit further configured to compute additional dot products used as input to compute a least squares error metric;
the processing system configured to compute the least squares metric from output of the additional dot products for plural geometric transform candidates processed by the vector processing unit to determine corresponding new geometric transforms, configured to compare the least squares metrics to a threshold, and configured to select new geometric transform candidates to refine based on comparing the least squares metrics to a threshold.
These methods, systems and circuitry provide reliable, and computationally efficient recovery of geometric transforms of data carrying signals embedded in images on physical objects. As such, they improve the data carrying capacity and robustness of the data carrying signals, and the aesthetic quality of the images with these data carrying signals. Aesthetic quality of imagery is enhanced because the inventive technology enables detection of weaker data carrying signals and data signals that are blended into host imagery and other information bearing content on objects, like product packaging and labels.
Further inventive features will become apparent in the following detailed description and accompanying drawings.