For a variety of image processing applications, it is necessary to determine the geometric distortion of an image, and, compensate for it. Technical fields where this is important include image and object recognition. Another application is decoding machine readable data encoded into an optical data carrier within an image. This data carrier may be an overt optical code such as a two-dimensional (2D) barcode or an imperceptible signal incorporated into the image. In the latter case, the data carrier is incorporated into an image to meet image quality, data carrying capacity and signal robustness criteria. Digital watermarking is an example of enhancing an image to embed auxiliary data.
Compensating for geometric distortion is necessary in these applications to extend the range over which recognition and data decoding provide reliable results (referred to as the operational envelope).
One approach for determining geometric distortion employs signal structure within the image. The geometric transform is determined by deriving geometric transform parameters of the signal structure in a distorted image. The signal structure may be pre-determined and inserted within images. Alternatively, it may be derived from an arbitrary image and the derived image structure (e.g., a feature vector of spatial features) stored in a database as a reference signal for later use in matching stored reference structures with corresponding structure derived from a suspect image. In some applications, the signal structure is a hybrid of an auxiliary image structure and inherent signal structure already within a target image. Regardless of how the signal structure forms part of image, the objective of the image processing method is to ascertain the geometric transform of that signal structure efficiently and accurately. The method must be efficient because processing resources, battery power, memory and memory bandwidth are constrained for practical applications in mobile devices and automated data capture devices, such as fixed and handheld optical code scanners. Moreover, even in cloud side applications where processing is more plentiful, image recognition and data extraction need to be efficient and have a broader operational envelope to handle noisy and distorted imagery.
A suspect image may not contain expected image structure, and as such, processing expended trying to detect it is a waste of processing resources. Thus, it is advantageous that the image processing method not waste resources on futile operations. The method should enable the host system to converge rapidly to a reliable recognition result or reject image blocks that are unlikely to lead to a reliable result.
Moreover, many applications require real time or low latency performance, as the image processing task must operate on a real time, incoming stream of image blocks, and there are strict time and hardware resource constraints on the amount of time and hardware allocated to each block. Examples where these constraints are prevalent include a battery powered mobile device and an automatic data capture device (e.g., barcode scanner) operating on an input stream of frames captured by its digital camera.
One driver of low latency operation is to provide an acceptable user experience. The geometric distortion must be detected within a limited period of time as the user is capturing image frames of an object so that responsive actions may be triggered (e.g., fetching of object information and augmenting a virtual reality display of a live video stream). Another driver is the limit of the hardware to retain and analyze frames from a live input stream of frames being captured of an object. A limited number of frames may be buffered and analyzed before the buffers and processing logic are assigned to new frames being captured by a camera.
Images incur geometric distortion in a variety of ways. The technology of this disclosure is concerned with determining and compensating for geometric distortion that occurs to an image relative to its original state. In its original state, its structure is known, either because it has been generated to incorporate a particular structure or the structure has been derived from its inherent features. These properties may be spatial or transform domain features (e.g., spatial frequency or autocorrelation domain) like peaks (local maxima or minima), corners, edges, etc. From this initial state, the image is geometrically distorted when it is rendered to a display or marked on a substrate (e.g., paper or plastic of a product package or label). The image is further distorted, for example, when the object to which it is applied or displayed is distorted. Displayed images are distorted to fit a particular display device. When a package substrate material, such as a plastic or paper based substrate, is formed into a package, the image is distorted into the shape of the object. During use of the object, the image is further distorted (e.g., non-rigid objects are readily deformable during normal use, including when being imaged). Then, when the image is captured digitally, by an imager in a mobile device (e.g., smartphone, tablet) or automatic data capture equipment (e.g., fixed or handheld barcode scanner), it is distorted further. In light of these various sources of geometric distortion and image noise, it is challenging to determine the geometric transform of a suspect image relative to its original state.
FIGS. 1-4 illustrate aspects of the geometric distortion problem with a simplified depiction of an image scanner capturing an image of a package 10. The plane shown as line 12 from this side view corresponds to the glass surface of a flatbed scanner. To introduce baseline concepts of camera and package tilt in one dimension, we depict it as virtual scanner glass 12 in FIGS. 1-4. Actual geometric distortion tends to be more complex, with tilt and camera angle in different directions, finite focal length(s) of the camera, etc.
FIG. 1 depicts the case where the camera angle is zero degrees, the package is flat, and the camera is assumed to have infinite focal length. Through a camera lens 14, the camera in the scanner captures an image shown at line 16. In this case, the captured image has its X coordinates multiplied by 1, reflecting that no geometric distortion is introduced. For this example, we illustrate distortion in one axis, X, of the spatial coordinate space. Similar distortion occurs in other axes.
In FIG. 2, the package 10 is tilted on the virtual scanner glass by an angle Δ. In this case, the captured image has its X coordinates multiplied by cos Δ due to the tilt of the package.
In FIG. 3, the package 10 has no tilt but the camera angle is α. In this case, the captured image has X coordinates multiplied by cos α. In some capture devices, the camera angle relative to the scanner surface is known, such as in flatbed scanners. In other devices, it is not. If the camera angle is known, image pre-processing can potentially compensate for it by dividing the image coordinates by cos α. However, this pre-processing may introduce additional noise into the image, even if it is slightly incorrect.
FIG. 4 illustrates the case where the camera angle is α, and the package is tilted by angle Δ. In this case, the captured image has X coordinates multiplied by cos(α+Δ). With a correction for the camera angle, the distortion is:
  distortion  =            cos      ⁡              (                  α          +          Δ                )                    cos      ⁡              (        α        )            
The optimal value of this function is 1. Otherwise, the image gets squished or stretched in a direction due to differential scale and sheer effects. FIG. 5 is a plot of the distortion for a fixed package angle Δ=10. As the camera angle increases, the distortion increases and becomes increasingly difficult to correct accurately. Further, in practice, additional geometric distortion, such as perspective distortion, is present, which is more challenging to compensate for in applications of image recognition and decoding machine readable data encoded in the distorted image.
In previous work, we have developed techniques for determining geometric transform parameters using log polar and least squares methods. Please see, in particular, U.S. Pat. Nos. 6,614,914, 7,152,021, 9,182,778, and U.S. patent application Ser. No. 14/724,729 (entitled DIFFERENTIAL MODULATION FOR ROBUST SIGNALING AND SYNCHRONIZATION)(now published as US Application Publication No. 20160217547), which describe various methods for determining geometric transformations of images. International Patent Application WO 2017/011801, entitled Signal Processors and Methods for Estimating Geometric Transformations of Images for Digital Data Extraction, provides additional disclosure, expanding on the technology in U.S. Pat. No. 9,182,778. In particular, WO 2017/011801 provides additional disclosure relating to the challenge of perspective distortion, including techniques for approximating perspective distortion with affine transform parameters. U.S. Pat. Nos. 6,614,914, 7,152,021, 9,182,778, US Publication 20160217547, and WO 2017/011801, are hereby incorporated by reference. See also Ser. No. 14/842,575, entitled HARDWARE-ADAPTABLE WATERMARK SYSTEMS (now published as US Application Publication No 20170004597), for more on implementation in various hardware configurations, which is hereby incorporated by reference.
While it is possible to approximate a perspective transform with an affine transform, an affine transform is not a perfect approximation. The focal length in scanner cameras is not infinity. To illustrate the point, a general perspective transformation can be described by the following homography matrix:
  H  =      (                                        a            11                                                a            12                                                a            13                                                            a            21                                                a            22                                                a            23                                                            a            31                                                a            32                                    1                      )  
The affine part of this matrix corresponds to parameters: a11, a12, a21, a22, and the purely perspective part of the matrix correspond to parameters: a31, a32. The translation part corresponds to parameters a13 and a23. Recovery of the affine parameters may approximate a perspective distortion, but this approximation is not always sufficient and some amount of correction for the perspective part is sometimes necessary.
In one approach, a direct least squares method is used to recover affine parameters and additional corrections are applied to correct the rest of the parameters (pure perspective and translation).
If designed properly, these various methods can provide an effective way to estimate geometric transform parameters. However, they can tend to consume significant computational resources or not sufficiently address certain forms of distortion, such as perspective. In this document, we describe methods that extend the operational envelope with improved efficiency and accuracy.
Our image processing methods determine a geometric transform of a suspect image by efficiently evaluating a large number of geometric transform candidates in environments with limited processing resources. Processing resources are conserved by using complementary methods for determining a geometric transform of an embedded signal. One method excels at higher geometric distortion, and specifically, distortion caused by greater tilt angle of a camera. Another method excels at lower geometric distortion, for weaker signals. Together, the methods provide a more reliable detector of an embedded data signal in image across a larger range of distortion while making efficient use of limited processing resources in mobile devices.
One aspect of the invention is a method of reading an embedded digital payload in an image. This method operates on a suspect image, e.g., an image block obtained from frames of images captured by the camera of a mobile device such as a hand held optical code reader or smartphone. The method transforms the suspect image into an image feature space. In this feature space, it seeks to determine the geometric transform of an embedded signal.
The method applies first and second complementary process to determine geometric transform candidates that are most likely to compensate for geometric distortion of the image and enable extraction of a digital payload from the embedded signal.
In particular, in one embodiment, a first complementary process executes a fitting process that produces first refined geometric transform candidates having detection metrics for the embedded signal that satisfy predetermined criteria. The fitting process finds geometric transform parameters that map components of an embedded signal to corresponding components detected in the received image. One example of a fitting process is a least squares fit, or least squares estimation. The fitting process is configured to evaluate larger geometric distortion in a parameter space, such as larger distortion due to higher camera tilt angles. A second complementary process evaluates lower geometric distortion in the parameter space, such as lower camera tilt angles. One example of a complementary process is one that correlates components of the embedded signal with components of a pre-processed image, in a coordinate space comprised of a range of candidate geometric parameters that correspond to the lower tilt angles. This coordinate space may be selected to address a more limited subset of geometric parameters, like rotation and scale, yet evaluate the image data with higher precision or resolution to improve payload extraction from weak signals (e.g., embedded signals that have been embedded with less energy, or for which the signal energy has been degraded in the process of printing, using or scanning an object).
The method selects a refined candidate geometric transform from the first and second refined geometric transform candidates of the complementary processes based on detection metrics, and extracts a digital payload from the embedded signal using the selected geometric transform.
Alternative aspects of the invention are embedded signal readers and modules comprised of instructions on a memory that are executed to determine geometric transforms of the embedded signal. In some variants, complementary geometric transform modules execute in series on a processor unit, while in others, they execute in parallel on processing units, such as processor units like GPUs or CPUs. Further, the modules themselves can sometimes be configured to subdivide geometric transform candidates into groups that are evaluated in parallel, e.g., using SIMD or like parallel data processing capability.
These methods, systems and circuitry provide reliable, and computationally efficient recovery of geometric transforms of data carrying signals embedded in images on physical objects. As such, they improve the data carrying capacity and robustness of the data carrying signals, and the aesthetic quality of the images with these data carrying signals. Aesthetic quality of imagery is enhanced because the inventive technology enables detection of weaker data carrying signals and data signals that are blended into host imagery and other information bearing content on objects, like product packaging and labels.
Further inventive features will become apparent in the following detailed description and accompanying drawings.