In many retail and industrial scenarios, users are confronted with the task of detecting objects in mobile device photo captures. For example, retail outlets offer various products for sale. Recognizing the objects in these outlets can be important to future applications, such as automated checkout kiosks. The products have to be scanned at checkout. Currently, the process may require the movement of the product to align a barcode and items have to be processed one at a time. Customer queues may become bottlenecks during peak periods of customer demand, possibly causing the customers to leave the retail outlet entirely. Automating the process from the use of image data would improve the overall process and experience.
In other cases, a customer may want to inquire about an object without having detailed information about the object, such as its product number or name. Detecting objects based on image data would allow a person to inquire about such items without knowing detailed information about them.
Automated recognition within images of objects, such as people, animals, automobiles, consumer products, buildings, etc., is a difficult problem. Often the list of hypotheses is very long and the candidates show only subtle differences. Conventional approaches often implement supervised learning, which can require training sets of images that have been labeled. Thus, such conventional systems rely on direct human input to provide object exemplars explicitly labeled as representing the object, such as a set of images known to include, for example, dogs, based on prior human examination. However, such human input is expensive, time-consuming, and cannot scale up to handle very large data sets comprising hundreds of thousands of objects and millions of images.
What is needed is a method that classifies objects with much higher accuracy than previously used methods for planar objects.