Object instance (or known object) recognition is the task of recognizing a specific object. Object instance recognition does not recognize categories of objects, but instead a particular object from a category. By way of example, these specific objects may include specific artwork (such as the Mona Lisa), a specific photograph, the front of a restaurant, or an object on a supermarket shelf.
Object instance recognition remains a challenging problem in computer vision. Literally millions of objects exist, and finding a computationally feasible method for recognizing a particular object can be difficult. Some of the most promising approaches to object instance recognition are feature-based techniques. Feature-based techniques extract local feature descriptors from salient points in an image. Recognition is achieved by matching feature descriptors from a query image with those found from a set of training images. Ambiguous matches are eliminated in a verification stage by matching objects using a global affine transformation.
One problem, however, with feature-based techniques is the difficulty of matching found features with those in the database. The size of the feature database can be quite large. In addition, the feature database scales linearly with the number of known objects. One way commonly used to reduce the computational complexity of this search is to use an approximate nearest neighbor (ANN) technique or a hashing technique. However, the limitations of these two techniques become apparent as the number of objects in the database increases. Another problem is that as the feature space becomes more crowded it becomes increasingly difficult to find correct matches, because several good matches might exist for any feature within a query image.
In large feature databases, the ambiguity of the correctly matching feature most likely is unavoidable. If it is assumed that the feature space will be densely populated, then each feature can be assigned to a cluster instead of finding its single closest match within the database. The set of clusters can be created using a modified K-means clustering algorithm during training. The number of possible clusters can range from 1,000 to over 10,000.
This set of cluster means creates a vocabulary of features. However, one problem is that the resulting symbols can be quite generic and are rarely object dependent. Another problem with the vocabulary of features approach is ensuring that corresponding features across images are assigned to the same symbol. If the feature appearance varies due to image noise or misestimation of position, scale or rotation, differing symbols maybe assigned.