Near-duplicate image detection and retrieval is useful to the filtering, retrieval, and management of multimedia content. Image near-duplicate (IND) refers to multiple images that are close to the exact duplicate of one image, but different in scene, camera setting, photometric and digitization changes. Specifically, the scale, viewpoint, and illumination of the same scene and object(s) captured in the IND(s) can be changed by different camera settings and rendering conditions. The composition of multiple objects can be different in the IND(s) due to some editing operations.
The INDs can correlate videos that depict the same news event from different broadcast sources and provide similarity clues for recognizing visual events and searching news video clips. Detecting INDs over the Internet assist in the discovery of the unauthorized use of private images for the application of copyright infringement detection, for example. Personal photo albums can be automatically organized by grouping/removing INDs, which might be of different names. Detection and retrieval of IND can also facilitate traditional text-based web searches. If two web pages contain any INDs, the relevance between these two web pages can be increased.
Retrieval and detection are two different but related tasks for IND. IND retrieval attempts to find all images that are duplicate or near duplicate to a query. The objective of IND detection is to find all duplicate pairs from an image collection. IND detection can be formulated as a retrieval problem by taking every image in the collection as the query image.
At least two issues related to IND detection and retrieval include the large variances within INDs make this problem challenging, and by formulating detection as a retrieval problem, the number of possible IND pairs increase quadratically with the size of the database. This has an overall effect on performance for IND processing.