The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
Consumers continue to experience an increasingly blurred distinction between real-world and on-line interactions. With the advent of object recognition technologies available today, consumers can now virtually interact with real-world objects through their smart phones and other mobile electronic devices. For example, consumers can capture an image of a movie poster via their cell phones. In response, the cell phone can construct an augmented reality interaction or game overlaid on the display of the cell phone. In fact, the Applicants have pioneered such technologies through their iD® technologies as implemented by DreamPlay™ (see URL www.polygon.com/2013/1/9/3851974/disney-dreamplay-ar-app-disney-infinity) Other technologies that attempt to offer similar experiences include the following:                Layar® (see URL www.layar.com),        BlippAR.com™ (see URL www.blippar.com), and        13th Lab (see URL www.13thlab.com).        
Unfortunately, such technologies are limited in scope and typically are only capable of recognizing a single object at a time (e.g., a single toy, a single person, a single graphic image, etc.). In addition, a consumer must position their cell phone into a correct position or orientation with respect to the object of interest, then wait for their the cell phone to analyze the image information before engaging content is retrieved. Ideally a consumer should be able to engage content associated with an object of interest very quickly and should be able to engage many objects at the same time. The above referenced companies fail to provide such features.
Objects represented in image data can be recognized through descriptors derived from the image data. Example descriptors include those generated from algorithms such as SIFT, FAST, DAISY, or other pattern identification algorithms. Some descriptors can be considered to represent a multi-dimensional data object, a vector or a histogram for example. However, the dimensions of the descriptor do not necessarily have equivalent object discriminating capabilities. Principle Component Analysis (PCA) can provide for statistical identification of which descriptor dimensions are most important for representing a training set of data. Unfortunately, PCA fails to provide insight into the discriminative power of each dimension or identifying which dimension of the descriptor would have greater discriminating power with respect to an environmental parameter (e.g., lighting, focal length, depth of field, etc.). As such, each dimension has to be processed in every instance to determine discriminating features.
U.S. Pat. No. 5,734,796 “Self-Organization of Pattern Data With Dimensional Reduction Through Learning of Non-Linear Variance-Constrained Mapping” issued to Pao, filed Sep. 29, 1995, provides systems and methods for visualizing a large body of multi-featured pattern data (e.g., chemical characteristic information) in a computationally efficient manner. The process involves subjecting the multi-featured pattern data to a nonlinear mapping from the original representation to one of reduced dimensions using a multilayer feed-forward neural net. While advantageous in some regards, Pao fails to appreciate that data can be acquired in a controlled environment under different conditions to empirically identify dimensions that can be reduced or ignored.
U.S. Pat. No. 6,343,267 “Dimensionality Reduction For Speaker Normalization and Speaker and Environment Adaptation Using Eigenvoice Techniques” issued to Kuhn et al., filed Sep. 4, 1998, describes techniques for speaker normalization in the context of speech recognition by an initially speaker-independent recognition system. The technique enables the speaker-independent recognition system to quickly reach a performance level of a speaker-dependent system without requiring large amounts of training data. The technique includes a one-time computationally intensive step to analyze a large collection of speaker model data using dimensionality reduction. Thereafter, a computationally inexpensive operation can be used for a new speaker to produce an adaptation model for the new speaker. Like Pao, Kuhn fails to appreciate that data can be acquired in a controlled environment under different conditions to empirically identify dimensions that can be ignored.
Some references contemplate controlling a data acquisition environment within the context of imaging and image analysis. For example, U.S. Pat. No. 7,418,121 “Medical Image Processing Apparatus and Medical Image Processing System” issued to Kasai, filed Dec. 10, 2004, describes a medical diagnostic imaging processing system that updates its training data by customizing a detection condition. The purpose of updating the training data is to enhance the system's diagnostic capabilities within a specialized medical field. Kasai fails to describe modifying a detection condition to empirically identify dimensions within a data set that can be ignored to improve computational efficiency for image processing.
U.S. Pat. No. 8,565,513 “Image Processing Method For Providing Depth Information and Image Processing System Using the Same” issued to Shao et al., filed Dec. 8, 2009, describes a method of estimating a depth of a scene or object in a 2D image by capturing different view angles of scene or object. Shao fails to appreciate that different views of the object can be used to empirically identify image descriptors that are less relevant for image recognition processing.
In the publication “Actionable Information in Vision” by Soatto, published in Proceedings of the International Conference on Computer Vision, October 2009, (see URL vision.ucla.edu/publications.html), Soatto states that the data acquisition process can be controlled (which he refers to as “Controlled Sensing”) to counteract the effect of nuisances. Soatto fails to discuss controlling the parameters and/or attributes of a data acquisition environment for the purposes of empirically identifying dimensions that can be reduced (e.g., ignored).
Object recognition techniques can be computationally expensive. The environments in which object recognition can be of most use to a user is often one in which the devices available for object capture and recognition have limited resources. Mobile devices, for example, often lack the computational capabilities of larger computers or servers, and network capabilities is often not fast enough to provide a suitable substitute. Thus, processing every dimension for discrimination with each execution of an object recognition technique can cause latency in execution, especially with multiple objects and/or in computationally weak computing devices. For certain applications, such as augmented reality gaming applications, this latency can render the application unusable. None of the references mentioned above provides an accurate and computationally inexpensive object recognition technique that involves empirically identifying dimensions that can be ignored. Thus, there is still a need to improve upon conventional object recognition techniques.
All publications herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.