Advancement in information technology is causing higher demand for searching for images. Sometimes, searching for near-duplicate images is necessary. Near-duplicate image detection requires the matching of slightly altered images to the original. For example, a sign or a watermark of a small area of an image which is introduced manually, or a picture zoom can cause one image to be a near-duplicate image to an original image.
Much effort has been devoted to visual applications that require effective image signature and similar metrics. Conventionally, an image extraction method for duplicate images detection includes two steps: first, certain features of two images are extracted and the features are called “signatures” of the images; then, the signatures of the two images are compared. If the signatures exactly match each other, the two images are determined to be the same. Color histogram vectors of the images are often used to represent the features of these images. And the vectors can be extracted by first selecting and qualifying a color space, such as a Red-Green-Blue (RGB) space, then calculating the number of pixels corresponding to each color within the whole or partial area of the image in order to form a color histogram, and constructing vectors using all the formed color histograms as signatures of the images. Therefore, one can use the above image extraction techniques to search for one particular image from a plurality of images.
To find an identical image of a given image from a plurality of images, comparison of the given image with each of the plurality images is often required. Because image feature extraction processes involved in the comparing process are complicated, searching efficiency is relatively low. Moreover, the conventional image feature extraction method typically cannot be used to determine whether two images are near-duplicate images because features of images may be changed with slight changes in the color of a partial area of the images. For example, a watermark embedded in an image may cause a slight color change in a small portion of the image, but it may cause a big change in the color histogram of the image.
Therefore, an efficient near-duplicate image search technique is needed.