1. Field of the Invention
The invention generally relates to systems and methods for analyzing digital images.
2. Description of the Relevant Art
Due to the convenience of assisted image editing tools, partially duplicated images are prevalent on the web. Partial-duplicate web images are usually obtained by editing the original 2D image with changes in color, scale, rotation, partial occlusion, etc. Partial duplicates exhibit different appearance but still share some duplicated patches. There are many applications of such a system to detect such duplicates, for instance, finding out where an image is derived from and getting more information about it, tracking the appearance of an image online, detecting image copyright violation, discovering modified or edited versions of an image, and so on.
In image-based object retrieval, the main challenge is image variation due to 3D view-point change, illumination change, or object-class variability. Partial-duplicate web image retrieval differs in that the target images are usually obtained by editing the original image with changes in color, scale, partial occlusion, etc. In partial-duplicate web images, different parts are often cropped from the original image and pasted in the target image with modifications. The result is a partial-duplicate version of the original image with different appearance but still sharing some duplicated patches.
In large scale image retrieval systems, the state-of-the-art approaches leverage scalable textual retrieval techniques for image search. Similar to text words in information retrieval, local SIFT descriptors are quantized to visual words. Inverted file indexing is then applied to index images via the contained visual words. However, the discriminative power of visual words is far less than that of text words due to quantization. And with the increasing size of image database (e.g. greater than one million images) to be indexed, the discriminative power of visual words decreases sharply. Visual words usually suffer from the dilemma of discrimination and ambiguity. On one hand, if the size of visual word codebook is large enough, the ambiguity of features is mitigated and different features can be easily distinguished from each other. However, similar descriptors polluted by noise may be quantized to different visual words. On the other hand, the variation of similar descriptors is diluted when using a small visual codebook. Therefore, different descriptors may be quantized to the same visual word and cannot be discriminated from each other.
Unlike text words in information retrieval, the geometric relationship among visual words plays a very important role in identifying images. Geometric verification has become very popular recently as an important post-processing step to improve the retrieval precision. However, due to the expensive computational cost of full geometric verification, it is usually only applied to some top-ranked candidate images. In web image retrieval, however, the number of potential candidates may be very large. Therefore, it may be insufficient to apply full geometric verification to the top-ranked images for sound recall.
Above all, based on the Bag-of-Visual-Words model, image retrieval mainly relies on improving the discrimination of visual words by reducing feature quantization loss and embedding geometric consistency. The expectation of real-time performance on large scale image databases forces researchers to trade off feature quantization and geometric constraints. Quantization of local features in previous work mainly relies on SIFT descriptor, resulting in limited efficiency while geometric verification is too complex to ensure real-time response.