Interest point matching refers to the process of matching two sets of features on a pair of images and finding correspondences between them. Matching interest points (sometimes called feature points or key points) is a key requirement for image registration. Image registration is widely used in 3D shape reconstruction, change detection, photogrammetry, remote sensing, computer vision, pattern recognition and medical image processing [Brown, 1992; Zitova and Flusser, 2003]. Such applications have a number of common characteristics: a) the images they deal with have no baseline or a short baseline; b) the images are normally processed in a short time; and c) feature-based algorithms are widely used.
Unfortunately, there are still many challenges with interest point matching. Although numerous algorithms have been developed for different applications, processing local distortion inherent in images that are captured from different viewpoints remains problematic. High resolution satellite images are normally acquired at widely spaced intervals and typically contain local distortion due to ground relief variation. Interest point matching algorithms can be grouped into two broad categories: area-based and feature-based. In remote sensing, area-based algorithms are normally suitable for open terrain areas but the feature-based approaches can provide more accurate results in urban areas. Although each type has its own particular advantages in specific applications, they both face the common problem of dealing with ambiguity in smooth (low texture) areas, such as grass, water, highway surfaces, building roofs, etc. Feature-based algorithms face the additional problem of the effect of outliers (points with no correspondences) on the results [Zitova and Flusser, 2003].
Because of the large number of feature-based algorithms used in interest point matching, there are many classification methods for describing these algorithms. Normally feature-based algorithms can be categorized into rigid and non-rigid (according to the transformation between images), and global and local (according to the image distortions), or corrected and uncorrected (according to the image variations). In addition, most of the feature-based algorithms search for correspondences and also address the refinement of a transformation function. Therefore, feature-based algorithms can also be grouped into three additional categories [Chui and Rangarajan, 2003]. They either solve the correspondence only, solve the transformation only, or solve both the correspondence and the transformation.
Although numerous feature based algorithms have been developed, there is no general algorithm which is suitable for a variety of different applications. Every method must take into account the specific geometric image deformation [Zitova and Flusser, 2003]. The first category of algorithms processes the global distortions. The ICP (Iterative Closest Point) algorithm is a classical global algorithm [Besl and McKay, 1992; Yang et al., 2007]. Because this algorithm requires the assumption that one surface is a subset of the other, it is only suitable for global distortion image registration [Williams and Bennamoun, 2001]. For medical image registration and pattern recognition, many rigid global transformations are used [Besl and McKay, 1992; Mount et al., 1997; Tu et al., 2008]. The B-Spline and TPS (Thin Plate Spline) deformation model is a common model for global distortion in medical image registration [Booksten, 1989, Kybic and Unser, 2003].
The second category of algorithms deals with the local distortions. For non-rigid local distortions, more complicated transformations are developed. TPS was proposed initially for global transformations, but it was improved for smooth local distortions for medical image registration [Gold et al., 1997; Chui and Rangarajan, 2003; Auer et al., 2005]. Another common local distortion model is the elastic deformation model [Auer et al., 2005; Rexilius et al., 2001].
Some algorithms do not need a transformation function. In computer vision systems and pattern recognition, feature descriptors extracted from an image's gray values are usually used [Belongie et al., 2002; Kaplan et al., 2004; Terasawa et al., 2005; Lepetit et al., 2005; Zhao et al., 2006]. SIFT (Scale Invariant Feature Transform) is one of the best descriptors for interest point matching [Lowe, 2004]. In graph matching algorithms, the topological relationship is the key feature and is widely used in pattern recognition [Gold and Rangarajan, 1996; Cross and Hancock, 1998; Demirci et al., 2004; Caetano et al., 2004; Shokoufandeh et al., 2006]. Another idea is to consider interest point matching as a classification problem. The features from the reference image are used to train the classifier [Lepetit et al., 2004; Boffy et al., 2008].
Although many of the feature-based algorithms described above are useful in solving problems for specific applications, they have four common drawbacks: 1) The features cannot be exactly matched, because of the variations of features between different images; 2) Outliers are difficult to reject [Chui and Rangarajan, 2003]; 3) For local image distortion, high dimensional non-rigid transformations are required, and a large number of correspondences are needed for the refinement of mapping functions [Brown, 1992], but too many features will make the feature matching more difficult; and 4) The feature description should fulfill several conditions, the most important ones being invariance (the descriptions of the corresponding features from the reference and sensed image have to be the same), uniqueness (two different features should have different descriptions), stability (the description of a feature which is slightly deformed in an unknown manner should be close to the description of the original feature), and independence (if the feature description is a vector, its elements should be functionally independent). Usually these conditions cannot be satisfied simultaneously and it is necessary to find an appropriate trade-off [Zitova and Flusser, 2003].
Images in photogrammetry and remote sensing contain local distortions caused by ground relief variations and differing imaging viewpoints. Because of their stability and reliability, area-based methods are usually used in remote sensing for interest point matching. Photogrammetric scientists are always attempting to improve the stability and reliability of interest point matching techniques. Hierarchical matching and relaxation algorithms are typical examples of such attempts. At the same time, great efforts are also being made to reduce the search area and increase the matching speed. The use of epipolar geometry is one of the most important achievements of such work [Masry, 1972; Helava, et al., 1973; Dowman, 1977; Gupta, 1997; Kim, 2000]. Despite the progress that has been made, area-based methods still have many drawbacks. The main limitations can be summarized as follows: 1) The rectangular image window is only suitable for image distortion caused by translation (in theory); 2) These methods cannot process smooth areas (areas without prominent texture); and 3) The methods are sensitive to image intensity changes which are caused by noise, varying illumination and the use of different sensors [Zitova and Flusser, 2003].