Object recognition is an important area of computer vision. Object recognition includes recognizing classes of objects (object category recognition) and recognizing individual objects (object instance recognition). The goal of object category recognition is to automatically recognize unknown object instances of known categories (such as cars or faces) and assign the object to the correct category. The goal of object instance recognition is to recognize a specific instance of an object (such as a specific car or person).
Both object category recognition and object instance recognition remain challenging problems in computer vision. Some of the most promising approaches to solving these challenging problems are feature-based techniques. In general, feature-based techniques extract local features or matching primitives from descriptors from salient points in an image. This is achieved by locating “interest points”, which are points in the image having a location, a scale, and sometimes a rotation. The interest points typically have a high variance in each surrounding direction. A feature is an interest point plus interest patch of pixels around and centered on the interest point. A matching primitive can be a set of multiple features grouped together in twos (a doublet), in threes (a triplet), as well as single features (a singlet). A feature is a subset of the set of matching primitives. To be highly discriminative, a feature or matching primitive should be able to be recognized in an image repeatedly. For example, a feature could be a dark spot on a white background, or a corner of an object. Recognition is achieved by matching features from a query image with those found from a set of training images. The effectiveness of these methods relies mainly on the discriminative power of the features.
Recently techniques that additionally model the spatial relationship of features have shown improved results. This is especially true for object category recognition, where the appearance of features across intra-class objects can vary more dramatically than object instance recognition. Several types of spatial models have been developed, including the constellation model, star models, rigid 3D models, and image-centric or warping techniques. Each of these methods creates a global spatial model for an object, whether they are parts-based, image-based or full three-dimensional (3-D) model. Moreover, another feature-based technique uses a triplet approach to group features into a set of three and then looks for matches. The general idea is to verify the geometric location and the spatial relationship of triplets relative to each other using a global model.
One drawback, however, to the global spatial model is the need to assume that the object in the image is semi-rigid in space. Any deformation in the global spatial model will reduce object recognition reliability. Another drawback is that the use of a global model typically requires a model of an object for each permutation or variation of the object. For example, if the object is a human face, the global model framework requires a model of a face for each type of face, such as with a mustache, without a mustache, with glasses, without glasses, and so forth. This requires a great deal of models and training data.