The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
Object recognition systems require large databases of known objects where the database stores attributes or parameters, typically image attributes, by which devices can recognize corresponding objects. Unfortunately, populating such databases is a very expensive, time consuming process. For example, to build sufficient information to recognize a moderately complex object, say a toy, the object has to be imaged and analyzed from many different views in a controlled setting. Image characteristics from the various views can then be stored in the object recognition database for future consumption. This approach creates a bottle neck for object ingestion because only one object can be ingested at a time.
Numerous examples of image-based object recognition techniques exist that leverage image characteristics. For example, U.S. Pat. No. 5,581,634 to Heide titled “Recognition System with an Automated Development Tool”, filed Apr. 6, 1994, describes using a tree structure to recognize objects and providing developers a tool to generate new recognizers. Similarly, co-owned U.S. Pat. Nos. 7,016,532; 7,477,780; 7,680,324; and 7,565,008 also describe techniques for recognizing objects. These and other references are useful with respect to recognizing objects based on image characteristics, but fail to provide for easy, automated object ingestion into a recognition infrastructure, especially for commodity objects in an uncontrolled setting. For example, ingesting objects in a public setting would be very difficult due to the varied shapes of objects in such settings.
Some progress with respect to attempting to identify objects in a search engine by searching based on shape. For example, U.S. Pat. No. 6,173,066 to Peurach et al. titled “Pose Determination and Tracking by Matching 3D Objects to a 2D Sensor”, filed May 21, 1997, discusses constructing queries based on geometric descriptions. Another example related to traffic signs includes U.S. Pat. No. 8,170,340 to Klefenz titled “Device, Method, and Computer Program for Identifying a Traffic Sign in an Image”, filed Dec. 18, 2007. Klefenz relies on edge detection to identify a sign. Still another example includes U.S. Pat. No. 8,429,174 to Ramani et al. titled “Methods, Systems, and Data Structures for Performing Searches on Three Dimensional Objects” filed Jan. 23, 2004. Ramani describes using 3D shapes, possibly based on user defined similarity criteria, to find known objects.
Other examples of shape-based object analysis include U.S. patent application publication 2006/0122999 to Sosnov et al. titled “Apparatus for and Method of Producing Graphics Contents and Computer-Readable Recording Medium Storing Computer Program for Executing the Method”, filed Sep. 20, 2005; U.S. patent application publication 2008/0103734 to Kobayashi titled “Supporting Apparatus, Design Supporting Method, and CAD System”, filed Aug. 27, 2007; U.S. patent application publication 2010/0092093 to Akatsuka et al. titled “Feature Matching Method”, filed Aug. 12, 2009; and U.S. patent application publication 2013/0336554 to Lewis et al. titled “Methods and Systems for Identifying, Marking, and Inventorying Large Quantities of Unique Surgical Instruments”, filed Mar. 14, 2013.
Although the above shaped-based searching techniques are useful with respect to searching for objects in a database, they still don't address construction of a database through commodity object ingestion. To some degree, U.S. Pat. No. 7,643,683 to Miller titled “Generation of Image Database for Multifeatured Objects”, filed Mar. 5, 2004, makes some further progress in database construction by using objects of the same generic type to generate as many images as possible, which are used to populate an image database for identification purposes. Miller seeks to generate 3D representations by using 2D projections from a range of viewpoints. Miller also uses small or large deformations of the 3D representations corresponding to anticipated internal movements in order to generate projections of the representations. Miller's database comprises images, which are useful for generating avatars as discussed. However, such a database is less useful with respect to “in-the-field” object recognition via devices having limited memory capacity, a smart phone for example. A compact database of object recognition information is still required.
In a somewhat similar to the Miller approach, shape information can be used to aid in ingesting object information by building object models from imaged objects. For example, U.S. patent application publication 2013/0293539 to Hunt et al. titled “Volume Dimensioning Systems and Methods”, filed May 4, 2012, describes building a wireframe package around a three dimensional object. In some cases, insufficient information is available from a signal point of view, so additional data is obtained from other points of view for selecting geometric primitives to fit the wireframe model to the object. Additional examples of using shapes to generate object databases include U.S. Pat. No. 7,929,775 to Hager et al. titled “System and Method for Recognition in 2D Images Using 3D Class Models”, filed Jun. 13, 2006. Hager discusses acquiring 3D images of objects, then placing corresponding object models into a canonical geometric form. Although Hager seeks to create an object database, Hager still requires controlled conditions, which places the technology outside the scope of unskilled technicians in an uncontrolled ingestion setting. Yet another example of building object models includes U.S. Pat. No. 8,532,368 to Se et al. titled “Method and Apparatus for Producing 3D model of an Environment”, filed Aug. 15, 2011. Se discusses generating photorealistic 3D models of objects from stereo images.
Even if shapes and object models are used to build object databases, some objects fail to fit a priori canonical geometric forms. In such cases, the forms must be altered to fit the object. Along these lines, further progress toward extracting objects from a photo is described in the paper “3-Sweep: Extracting Editable Objects from a Single Photo”, to Chen et al., SIGGRAPH Asia 2013, Nov. 19-22, 2013. Chen describes allowing a human to snap components to an image of an object, which then provides for extracting 3D objects. Unfortunately, such techniques still rely very heavily on human interaction and are not easily automatable. In a somewhat similar vein U.S. patent application publication to Vaddadi et al. titled “Methods and Systems for Capturing and Moving 3D Models and True-Scale Metadata of Real World Objects”, filed Jul. 27, 2012, also discusses generating a model based on user input and captured image data. Deforming shape variations is also discussed by U.K. patent application publication GB 2488237 to Adeyoola et al. titled “Computer Implemented Methods and Systems for Generating Virtual Body Models for Garment Fit Visualization”, published Aug. 22, 2012. Adeyoola describes generating a virtual body model where images of garments can be combined with the virtual body model. The techniques disclosed offer insight into how to construct object model based on image data and geometric forms. Such models are too resource intensive to manage and use in the field. Still, devices in the field require compact recognition data to determine if an imaged object in the field matches a known object in an object recognition data set.
U.S. patent application publication 2005/0286767 to Hager et al. titled “System and Method for 3D Object Recognition Using Range and Intensity”, filed Jun. 22, 2005, progresses further by using descriptors to identify objects. Hager describes acquiring images of a scene and comparing descriptors from the scene to descriptors of known models to identify objects within the scene. Descriptors provide for fast object recognition. Still, one must compile a database of descriptors during ingestion of object information. Along these lines, effort has been directed to building descriptor information based on object models. One example of capturing 3D object information includes International patent application publication WO 2009/069071 to Kleinhorst et al. titled “Method and System for Three-Dimensional Object Recognition”, filed Nov. 25, 2008, discussed building object models from multiple views of an object where the 3D feature descriptors are computed using 2D feature descriptors and camera's known view.
Kleinhorst provides for generating descriptors from a camera's known view point. However, for ingestion of commodity objects a camera's view point might not be known or calculable even if shape information is available. Consequently there still remains a need to determine from which perspectives object recognition information should be derived. This is especially true when the object model can deviate from known shapes. Further, there is still a need for systems and methods through which one can ingest large numbers of ordinary or commodity objects quickly into an object recognition database of known objects.
All publications herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.