Interest points are markers anchored to a specific position in a digital image of an object. They are mathematically extracted in such a way that, in another image of the object, they will appear in the same position on the object, even though the object may be presented at a different position in the image, a different orientation, a different distance or under different lighting conditions. Good interest point detectors produce a number of stable interest points.
The goal is to match interest points in one image with corresponding interest points in another image. This is the key process behind a wide range of detection, recognition, segmentation and tracking problems. Conventionally, to match interest points, descriptors are constructed. Interest points and descriptors are used to identifying and correlate related regions in two or more images, such as frames in a video stream. Descriptors are local statistics of a patch of the image around each interest point, typically a local histogram of gradients. Rotation and scale invariance may be obtained by transforming the patch according to the scale and principal direction of the interest point prior to computation. Popular types of interest point descriptors are the SIFT descriptor, discussed in Lowe, D. G.: “Distinctive Image Features”, International Journal of Computer Vision, 2004, and Microsoft's daisy, for which see Winder, S., Hua, G., & Brown, M.: “Picking the Best Daisy”, CVPR (Computer Vision and Pattern Recognition), 2009.
With local descriptors, objects are identified by placing the descriptors for a reference image (desired object) into an unstructured list. To identify the same object in a test image, interest point descriptors are computed for the interest points in the test image. A sufficient number of sufficiently close descriptors indicates that the desired object is present in the test image.
There are a number of drawbacks with this technique. Image descriptors require a great deal of processing to generate them. They are not particularly compact. Indeed, the descriptor data for a typical image can exceed the size of the image data, which creates a bandwidth problem in real-time processing. Moreover, the conventional approach takes no account of the spatial positioning or orientation of one interest point relative to another.