Image fingerprinting (aka image signature technology) commonly involves deriving a set of 2D feature points from imagery, and then searching a set of reference image feature points for a closest match, to thereby identify a corresponding reference image. The SIFT, SURF, and ORB algorithms are commonly employed. (See the section entitled Feature Extraction below.)
Such arrangements work well for identifying 2D images (and 2D objects). But they break down when trying identifying 3D objects.
Consider a box of breakfast cereal. If the box is imaged in a plan frontal view, existing image fingerprinting can suffice to identify the front panel image, and thereby identify the object. But if the camera view is oblique, as in FIG. 1, conventional fingerprinting starts having difficulty.
The FIG. 1 image depicts both the left and front panels of a cereal box. The former appears tilted away from the camera to the left, fore-shortening its left-most edge. The latter appears oppositely tilted to the right, fore-shortening its right-most edge. Even if a reference fingerprint (e.g., SIFT) was available for the entire package (e.g., in a flat configuration, before the box was glued into its 3D configuration), this reference fingerprint will be difficult to match with the FIG. 1, given its projective distortions in two opposing directions.
Still more difficult are objects that cannot readily be fingerprinted in a “flat” state (e.g., before a box is glued). Consider an egg carton. Or a tea pot. Or a parking meter. What does a fingerprint mean in these contexts?
Much has been written about the Internet of Things. Early realizations have relied on RFID chips to identify objects. But if a broader version of this vision is to be achieved, it will rest on a broader ID of Things foundation. The present technology provides such a foundation.
In accordance with one embodiment of the present technology, object recognition systems are advanced to accommodate viewpoint variability of 3D objects.
More particularly, the disclosed technology (including the materials incorporated by reference) delves into illustrative implementations employing some of the following themes:
1. If probabilistic object recognition using mobile personal devices is to make the next significant leap in approaching fast 100% detection and approaching 0% false positives, object signatures need to incorporate three dimensional information about the object, and matching algorithms may make decisions (e.g., an ending operation in a multi-stage method) based on projective transformations (i.e., certain geometric transformations preserving collinearity and cross-ratio, but not parallelism) rather than 2D affine/warping (i.e., certain geometric transformations preserving parallelism). Mass implementations of this capability will often require three or more stage candidate filtering approaches, which include more sophisticated device, local-server and global-server divisions of labor and their associated packet exchanges.
2. Object Signature Collection, Registration, Fast-Search Processing and Matching-Database Proliferation: There are many diverse approaches to gathering three dimensional information on objects, from simple stereo pair extraction and Wave-at-it, to Gladson/Itemaster and the over-the-top Optical Lab (all detailed below). Other depth sensing camera technology can also be employed. The result is usually 3 dimensional “draped meshes,” with cost and quality of information being a function of the empirical approach used. Supplying fast early stage filtering algorithms with sampled-steradian 2D views of an object also occurs during object/thing registration. These sampled views have explicit Profile Masks associated with them.
3. Personal Devices Recognizing Things: First stage filtering will often involve current art 2D fast searching, trying to get reasonable matches to one of the steradian views/masks. Known-profile masking will be employed (not using image data behind masks), and pass-thresholds will be significantly lowered. Second stage filtering will be done primarily using Profile and Morphological features, with some Image features (P, M and I features, respectively), honing the projective viewpoint angle and distance parameters from coarse and canned steradian view defaults. Thresholds remain modestly low. Third stage processing may bring back all classic 2D features (I features) as well as the P and M features, performing projections which enable false positive rejection to reach application-defined degrees (e.g., 99.999 . . . %), through empirically calibrated thresholding.
4. Device, Local-Server, Global-Server Dynamics: Many retail, in-store applications will push key reference features directly onto the user's device, allowing fast device-side execution of object recognition, constrained by power consumption, memory and channel usage. Fluidity of “where” various recognition stages are actually executed provides a welcome design flexibility in the device-local-global continuum.
The foregoing and other aspects of the present technology will be more readily apparent from the following detailed description, which proceeds with reference to the accompanying drawings.