There are two primary methods for image search and recognition, which can be broadly characterized by matching images based on some important measure(s) of similarity. One method is based on the conversion of images into vector representations, where multi-dimensional coordinates summarize features of an image. These vectors are then organized into one of a few known data structures that allow fast search of the nearest samples in the vector space. When a new source image is received, the source image is converted into vector representation, and the nearest samples from the target database to the vector representation are retrieved. The second method is based on assigning text tags (also known as metatags) to images. Search for similar images in such a database is conducted by text search algorithms that look for common text tags between images.
Both methods have two serious interface-related problems that limit their effectiveness for image search and recognition. The first problem is the inflexibility of these user interfaces to specify the parts of the source image that are important for search or recognition. The tendency of existing approaches to delegate such a decision to the computer application is not a fruitful solution, because the computer cannot read a user's mind. For instance, when a user points at a person in a photograph, no one except the user knows what is more relevant for the image search and recognition task: the person's identity, the person's suit, or perhaps the person's hair color.
The second problem with existing search and recognition methods is that there is no organic way to specify spatial relationships between relevant image parts. By not utilizing this spatial information, matching is less targeted and consequently less accurate. For instance, a person with a green shirt and white pants can be easily confused with a person with a white shirt and green pants.
Existing techniques for image recognition commonly rely upon template matching. In the most straightforward implementation of template matching, an object is represented by a number of templates that represent the object in different sizes and in different orientations. To find this object in an image, all templates are matched with every location in the image. The locations where the match is good up to a certain criterion are considered “hits”.
Template matching is prohibitively slow, because it requires matching at all locations and for all templates. Hierarchical matching and contour matching are often mentioned as techniques to expedite template matching. In hierarchical template matching, the template set at a coarse scale is comprised of a few views, which in turn represent clusters of views at a finer scale. Running a relatively fast coarse matching first, finding constraints for the fine scale next, and then running the more accurate fine matching leads to substantial acceleration of the matching process. In the contour matching approach, all templates and the analyzed image are first converted to edge maps, and subsequent matching is done for these maps only. The image contour map first gets blurred in accordance with the so-called chamfer metric, and then all templates get matched with every location in the analyzed image. The advantage of this technique is that matching does not need to be done for the whole area of a template; only the edge points in the template need to be considered. This reduction of the points from area to contours leads to substantial performance gains.
Despite the benefits of the described techniques, the template matching approach remains prohibitively slow because of the combinations of locations, scales and orientations that need to be tested. Moreover, as practice shows, a template set consisting of a few dozen items is often insufficient for describing an object because it should also allow small-to-moderate deformations of those templates, which substantially increases the computational load.
In view of the foregoing, it would be desirable to provide improved techniques for image recognition.