In general, Image Near-Duplicate (IND) refers to a pair of images that are close to being exact duplicates of one another, such as two images of the same object or location that differ only slightly due to changed capturing conditions, times, rendering conditions, and/or editing operations, and so forth. Detection or retrieval of IND is very useful in a variety of real-world applications. For example, IND can improve traditional tag-based image searches by filtering out duplicate images returned by search engine; IND can be used as a way of building bridges between two web pages in different languages; IND can provide similarity clues for recognizing visual events and searching news video clips.
Image Near-duplicate (IND) retrieval thus aims at finding images which are duplicates or near duplicate to another (e.g., query) image. One of the most popular and practical methods toward IND retrieval is based on Bag-of-Words (BoW) model which assumes that image retrieval is analogous to document retrieval. The general idea behind BoW methods is that local regions of images are characterized using high-dimensional descriptors, which are then mapped to “visual words” selected from a visual vocabulary.
However, the BoW approach is problematic in that visual words are poor in their ability to express regions relative to text words. As image retrieval is a growing in importance, including with respect to image-based querying, any improvements in image retrieval are beneficial.