The invention relates generally to the field of digital image processing. More specifically, the invention relates to a method and system for matching an image with another image.
Image matching is a fundamental technique that is being used in computer vision, object recognition, motion tracking, 3D modeling, and the like. Image matching is performed to check whether two images have the same content. The two images need not be exactly the same. For example, one image may be rotated or taken from a different viewpoint as compared to the other image, or it may be a zoomed version of the other image. Further, the two images may be taken under different lighting conditions. Despite such variations in the two images, they contain the same content, scene or object. Therefore, image matching techniques are used to match images effectively.
Typical image matching algorithms take advantage of the fact that an image of an object or scene contains a number of feature points. Feature points are specific points in an image that are robust to changes in image rotation, scale, and viewpoint or lighting conditions. This means that these feature points will often be present in both images, even if the two images differ in the manner described earlier. Therefore, the first stage of the image matching algorithm is to find these feature points in the image. Typically, an image pyramid is constructed to determine the feature points of an image. The image pyramid is the scale-space representation of the image, i.e., it contains various pyramid images, each of which is a representation of the image at a particular scale. The scale-space representation enables the image matching algorithm to match images that differ in overall scale. After determining the feature points of the pyramid images in the image pyramid, orientations of the feature points are determined based on the local image gradient at the feature points. These orientations provide invariance of the feature points to rotation. The feature vector representation provides for significant change in local distortion and illumination, i.e., the feature vector is invariant to distortion and change in lighting conditions.
The feature points, their orientations and feature vectors of the pyramid images form a complete representation of the image. These representations can be compared across images to find a matching image. A pair of images is matched based on matching the feature points of the images. The pair of images can be determined to be a match when a sufficient number of feature points match the corresponding feature points of the other image both visually and geometrically. Feature vectors that are close to each other are visually similar, and the corresponding feature points are called ‘putative correspondences’ or ‘correspondences’. The putative correspondences are generally processed by a statistical algorithm to test geometric consistency.
Generally, for geometric matching of images, the statistical algorithm used is the Random Sample Consensus (RANSAC) algorithm, although other variants of RANSAC-like algorithms or other statistical algorithms can be used. In RANSAC, a small set of putative correspondences is randomly sampled. Thereafter, a geometric transformation is generated using these sampled feature points. After generating the transformation, the putative correspondences that fit the model are determined. The putative correspondences that fit the model are geometrically consistent and called ‘inliers.’ Thereafter, the total number of inliers is determined. The above mentioned steps are repeated until the number of repetitions/trials is greater than a predefined threshold or the number of inliers for the image is sufficiently high to determine an image as a match. The RANSAC algorithm returns the model with the highest number of inliers corresponding to the model.
One problem associated with using this type of method is that the set of possible transformations generated by the statistical algorithm may be larger than the set of physically valid transformations. For example, the transformation may flip one side of a rectangle, causing a twist that is impossible to achieve with a rigid object. In another example, it may flip the entire rectangle, a transformation that is achievable only by taking a picture of a reflection of the object. This can lead to incorrect matching of images. Further, this can cause useless computation, since analysis of parameters/points generated by the transformation is done even though the transformation itself may be physically invalid or infeasible.
Each feature point in the putative correspondence has an orientation associated with it. In applications where rotational invariance is required, for a transformation to be valid, it should preferably preserve the orientations of the two feature points in a putative correspondence. Many applications that use RANSAC do not take this constraint into account.
Even if the putative correspondences are determined to be closely matching, the putative correspondences alone generally do not ensure that the pair of images forms a final match. Putative correspondences only give the information about visual similarities between feature points of the pair of images. This is generally not sufficient to determine the final match between the pair of images. There is a possibility that corresponding areas of both images may generate multiple putative correspondences. For instance, if an image feature is salient at more than one scale, multiple feature points may be generated, possibly resulting in multiple putative correspondences. Choosing one of these putative correspondences to generate the transformation means that the other putative correspondences will also become inliers, thereby creating a false sense of information that the two images that are matched using this transformation are truly matching. Removing all but one of a set of corresponding feature points a priori is an incorrect approach as there may not be multiple putative correspondences for different query images, and there is no way to say which one is best because there may be multiple putative correspondences that are falsely interpreted. In other words, using only putative correspondences generally cannot provide enough information so as to establish a clear match between the query image and the database images.
Moreover, it might be possible that the two images may share an element or a small part of the image, like a logo for a corporation or other entity. The images may also share a piece of text in the same font. These shared elements may create enough inliers to declare an image match, while in reality the two images are not similar. Further, there can be a case that a query image may have multiple image objects, each of which is represented in a separate database image. The database images are a plurality of images with which the query image needs to be matched.
There exists a need for an improved image-matching method for overcoming the limitations mentioned above.