Discovering duplicate images within an image space can be beneficial. For example, identifying duplicate images can provide better object recognitions results. It can also prevent duplicate images from being presented in an image search results page. Duplicate image discovery techniques fall into two categories: full duplicate discovery and partial duplicate discovery. Conventional partial duplicate discovery—or the discovery of images that may not be full duplicates but that have the same objects within them—utilizes local descriptors of the images and adopts various hashing techniques. Such techniques have been scaled to discover duplicates within an image space containing millions of images.
The problem of full duplicate discovery—or the discovery of images that are full duplicates (albeit with slight variations in scale and/or content)—can be tackled with global feature-based methods. Duplicate image discovery is different from, and more challenging than, duplicate image retrieval. Conventional duplicate image retrieval methods are not scalable, since the computational costs are quadratic to the number of images in the image space.