It is often desirable to be able to perform database searches to identify stored images which are the same as, or partial duplicates of, a query image. Applications for such search engines include location of copyright violations, to find better and/or higher resolution duplicates of a query image and to find more information on a query image. While there are many image searching methodologies, one type of image search relates to two-dimensional image searches. Image searching over the World Wide Web is a common example of two-dimensional image searching. A search engine should be able to identify two-dimensional candidate images from a query image, even where the candidates have changes in scale, are cropped differently, or where the query/candidate image is partially blocked (by another image) or only partially duplicated. Prior art FIG. 1 presents examples of two-dimensional searches including query images 20 (on the left), and candidate images 22 which are identified for the respective query images.
Instead of comparing entire query images against entire stored images, current frameworks for two-dimensional image searches process the query and stored images using any of various feature detection schemes. In general, feature detection schemes identify local areas of interest within images, such as for example edges where there is a boundary between two image regions, and corners where for example two edges come together. One popular feature detection scheme is the Scale-Invariant Feature Transform (SIFT) algorithm. The SIFT algorithm is described for example in U.S. Pat. No. 6,711,293, entitled, “Method and Apparatus for Identifying Scale Invariant Features in an Image and Use of Same for Locating an Object in an Image,” to David G. Lowe, which patent is incorporated by reference herein in its entirety. In general, SIFT feature detection finds distinctive keypoints that are invariant to location, scale and rotation. The SIFT keypoint gains invariance to scale and rotation by exploiting scale-space extrema and the local dominant orientation. In order to detect keypoints, the image is convolved with Gaussian filters at different scales, and then the difference of successive Gaussian-blurred images are taken. Keypoints are then taken as maxima/minima of the Difference of Gaussians (DoG) that occur at multiple scales. This is done by comparing each pixel in the DoG images to its eight neighbors at the same scale and nine corresponding neighboring pixels in each of the neighboring scales. If the pixel value is the maximum or minimum among all compared pixels, it is selected as a candidate keypoint.
In large scale image searches, for example those performed via the World Wide Web, it is necessary to match a single SIFT feature to millions or even billions of SIFT features computed from a large corpus of web images. In this scenario, the discriminative power of the quantized SIFT feature decreases rapidly, resulting in many false positive matches between individual features.
Another popular feature detector is the Maximally Stable Extremal Regions (MSER) algorithm. The MSER algorithm is described for example in the paper by J. Matas, O. Chum, M. Urba, and T. Pajdla, “Robust Wide Baseline Stereo From Maximally Stable Extremal Regions,” Proc. of British Machine Vision Conference, pages 384-396 (2002), which paper is incorporated by reference herein in its entirety. Unlike the keypoints identified using a SIFT feature detector, MSER detects affine-covariant stable elliptical regions. Usually the MSER detector outputs a relatively small number of regions per image and their repeatability and distinctness are relatively high; that is, if an MSER feature shows up in a query image, it is also likely to be found in the same or similar stored image. However, false positive matches remain an issue for large image databases. The sources of false positives are twofold: 1) each MSER feature is still represented by a single SIFT descriptor no matter how large the region is; and 2) quantization further decreases the discriminative power of the feature.
In order to work with the information provided by feature detectors such as SIFT, existing large scale image retrieval systems typically rely on first quantizing local SIFT descriptors into visual words (see for example, D. Lowe, “Distinctive Image Features From Scale-Invariant Keypoints,” International Journal of Computer Vision, 20:91-110 (2003), incorporated by reference herein in its entirety). Once the visual words are determined, matches are found by applying scalable textual indexing and retrieval schemes (see for example, J. Sivic and A. Zisserman, Video Google: A Text Retrieval Approach to Object Matching in Videos, In Proc. ICCV, (2003), incorporated by reference herein in its entirety). While critical for scalability, quantization has at least two drawbacks. First, modifications to an image patch can lead to its corresponding descriptor being quantized into different visual words. Second, quantization reduces the discriminative power of local descriptors since different descriptors quantized to the same visual word are considered to match with each other even though there may be differences. These two issues reduce the precision and recall in image retrieval, especially for low resolution images.
It is therefore known to employ various geometric verification processes as post-processing steps for getting reasonable retrieval precision, especially for low-resolution images. Such known post-processing geometric verifications are disclosed for example in the papers: H. Jegou, M. Douze, and C. Schmid, “Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search,” In Proc. ECCV (2008), and J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object Retrieval with Large Vocabularies and Fast Spatial Matching,” In Proc. CVPR (2007).
However, full geometric verification is computationally expensive. In practice, therefore, it is only applied to a subset of the top-ranked candidate images. For large scale image retrievals, such as web image searches, the number of near or partial duplicates could be large, and applying full geometric verification to only these top-ranked images may not be sufficient for good recall.