As digital imaging has increased in popularity there has been an increased demand for software that can recognize similar or duplicate digital images based on the contents of the images. One such method for identifying duplicate images is known as the min-hash algorithm.
In the min-hash algorithm, one or more hashing functions are applied to each visual word associated with the image, and the visual word with the minimum hash value is selected as a global descriptor, i.e., the min-hash, of the given image. Multiple hash functions are usually applied to compute a sketch—a set of min-hashes that are used jointly to represent the image. Two images that have matching sketches are identified as matching images. The degree of sameness between matching images may be adjusted by changing the number of hashing functions for a sketch that are applied to the images, as well as the number of sketches generated for each image.
While the min-hash algorithm is useful for identifying matching images, it may have difficulty determining partial matches within images. For example, a user may be interested in all images in a collection that include a landmark such as the Eiffel tower. The min-hash algorithm described above would not be effective for identifying such partially matching images.