Since the combination of sophisticated cameras with smart phones, mobile computing applications (“apps”) have proliferated that utilize images captured by the cameras. In some of the more popular apps, users capture images of consumer products in an attempt to identify the products. The apps then link to descriptions of the products, ratings, reviews, pricing, options for purchasing, shipping, etc. Stores and businesses also have apps for imaged products that provide tips to users searching for products while on store premises, that improve inventory control, that facilitate compliance with shelf planograms, and the like. Apps also often distinguish their services based on categories of consumer products, such books, cars, clothing, electronics, groceries, etc. There are even apps for identifying consumer drugs and plants and for confirming authenticity of items.
During use, users capture images of products and/or their label/nameplate/etc. for comparisons to databases. The better the image, the faster the match to the database and the faster the results are displayed to users. Bad or poor quality images, however, beget slow matching and perhaps false matches. Non-recognition of consumer products may result if multiple products get captured in a single image, such as might be captured on multiple shelves of a store. Underlying the apps, object recognition drives technology used to identify objects in an image or video.
When objects are of a known size and orientation, image correlation or edge matching techniques are used for identification. However, such algorithms, known as global feature identification, can be quite expensive and often involve stepping through the image and performing pixel by pixel comparisons to objects in databases, which slows results. In addition, the techniques do not guard well against image distortion, partial occlusion, scale variations, rotation, and changes in image perspective.
To overcome this, several algorithms use local invariant features that are encoded to remain stable over a range of rotations, distortions, and lighting conditions. Scale Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Maximally Stable Extremal Regions (MSER) are three popular algorithms. SIFT identifies key-points in an image by finding the maxima and minima of the Difference-of-Gaussian filters over a range of scales. SIFT uses the gradient magnitude and direction of neighboring key-points to uniquely identify strong key-points. Because of the generation of large numbers of key-points, SIFT is robust against partial occlusion and some level of noise, but deteriorates with lighting changes, blurring, and large scale variations. The large numbers of key-points also means computational expense when generating volume and, in turn, finding matching key-points. In practice, key-point matching is also known to generate many false positive matches.
The SURF algorithm, improves upon SIFT by using the sums of 2D Haar wavelet responses to more quickly identify key-points and do so over a range of scales. While better, SURF still suffers disadvantages by generating too many false positives. MSER, on the other hand, identifies connected pixels whose shape does not change over a large range of thresholds which generates lower numbers of key-points. However, MSER is known to limit the types of images that can be identified using the techniques and is sensitive to blur and discretization effects.
Accordingly, a need exists in the art to better identify consumer products and do so with algorithmic techniques amongst image pixels. Further needs also contemplate instructions or software executable on controller(s) in hardware, such as imaging devices, or computing apps for smart phones or other devices. Additional benefits and alternatives are also sought when devising solutions.