Advances in image processing technologies have created many new and useful ways to identify tangible or otherwise visible objects (e.g., newspapers, magazine pages, posters, product packaging, consumer products, labels, event tickets, credit cards, paper currency, electronic device housings, displayed video or still imagery, etc.) in support of a wide variety of applications relating to advertising, augmented reality, content identification, copyright control, digital rights management, e-commerce, gaming, image-based search, social tagging, security, etc. In a typical scenario, an object identification process is initiated by first capturing an image of a surface of the object. The captured image is then subjected to one or more feature detection processes to discern between a signal representing a specific information-bearing feature (e.g., that can be used to identify the object) within the captured image and other features or information visually conveyed by the object. As used herein, a “feature” generally represents a robust image characteristic of the type that can be inserted in the visual information already conveyed by the object (e.g., via use of a one- or two-dimensional barcode, digital watermark, dataglyph, etc.), or an image characteristic that can be otherwise identified within the visual information already conveyed by the object (e.g., via use of known fingerprinting techniques), or a combination of both.
In some cases, the accuracy or reliability of a feature detection process depends on the pose of the surface of the object being imaged relative to the device used to capture the image of the object. In this context, “pose” can refer to one or more attributes such as distance between the object being imaged and the device capturing the image (also referred to as “scale”) and tilt of the object being imaged relative to an image plane of the device capturing the image (also referred to as “shear” or “differential scale”). Hence, proper alignment of the object surface being imaged with a camera-equipped electronic device is important to ensure that a feature is accurately and reliably detected.
Some conventional feature detection processes have been developed that are invariant to one or more of the aforementioned pose attributes (e.g., as with Scale-Invariant Feature Transform—SIFT, or Speeded Up Robust Features—SURF, etc.), but these processes can be undesirably time-consuming or require an excessive amount of processing resources. Other conventional feature detection processes iteratively perform one or more operations, with each operation using one or more parameters optimized for a specific pose or range of poses, until an acceptable result obtained. While such iterative processes can be less computationally expensive than pose-invariant processes, these processes are still undesirably slow for certain applications (e.g., involving rapid scanning of multiple objects over a short period of time).
Accordingly, it would be desirable to perform feature detection processes in a manner that is both faster than conventional iterative processes and requires less computational resources than typical pose-invariant detection processes.
Still other approaches seek to establish the pose of an object surface by using various 3D modeling techniques. For example, depth sensing cameras can be used to assess the distance from the camera to each of multiple points on the object surface, to thereby determine the pose of the object relative to the camera. One such camera is Microsoft's Kinect system (which is understood to employ technology developed by PrimeSense that is detailed, e.g., in published patent applications 20100020078 and 20100118123). The Kinect system projects a speckle pattern of infrared laser light into a scene, and captures imagery showing the pattern as projected onto the surface(s) in the scene. This imagery is analyzed to discern deformation of the projected pattern and deduce, from such deformation (e.g., by techniques such as depth from stereo and depth from focus), the shape of the scene surface(s). Such approaches, however, are sometimes expensive and/or technically complicated.
In accordance with one aspect of the present technology, a diverging, conical, beam of light is projected from a source, such as an LED, onto an object surface. If the object surface squarely faces the axis of the conical beam, the beam of light will project a perfect circle on the object surface. If the surface is inclined, the beam of light will result in an ellipse-shaped illumination pattern. The major axis of the ellipse indicates the direction of the surface tilt; the length ratio between the major and minor ellipse axes indicates the amount of surface tilt. The size of the ellipse indicates the distance to the surface.
Such a conical beam can be produced inexpensively by an LED behind a lens and/or a round hole. Such a pattern can be projected continuously. Alternatively, the pattern can be projected for just a brief instant (e.g., a tenth, thirtieth, or a hundredth of a second), and may be sensed by a camera whose exposure is synchronized to the illumination interval. The illumination may be of narrow spectrum (e.g., 660 nm, as by a red LED), or broader spectrum. The spectrum may fall in the visible range, or may be infrared or ultraviolet.
Such a pose detection arrangement may be integrated, for example, into a point-of-sale scanner of a supermarket checkout station. For instance, an infrared conical beam may be continuously projected from the scanner. An associated imaging sensor captures imagery of the resulting pattern. Such imagery is then sent to an image processor, which analyzes same to discern the parameters of the ellipse, and report the indicated direction, and amount, of object surface tilt—as well as ellipse size. The image processor can then use these parameters to counter-distort imagery captured by the POS camera (i.e., the same camera, or another—e.g., visible light camera), to virtually re-orient depiction of the object surface so that it appears to squarely face the camera, at a certain scale. Feature detection (e.g., barcode recognition, watermark decoding, fingerprint-based object recognition, etc.) is then performed using this counter-distorted imagery.
It will be recognized that this simple arrangement provides pose information useful in performing feature recognition, without the costs and complexity of prior art 3D modeling approaches.
More generally, imagery captured from an object can be processed based on contextual data that at least partially characterizes a condition of the object when the imagery was captured. The contextual data can be obtained directly by a sensor or can be derived by pre-processing the captured imagery. When the contextual data is known, a feature detection process can then be performed. By processing captured imagery based on contextual data, features can be quickly and accurately detected (e.g., as compared to conventional iterative feature detection processes) without relying on computationally expensive pose-invariant processes.
The foregoing and other features and advantages of the present technology will be more readily apparent from the following detailed description, which proceeds by reference to the accompanying drawings.