It is often required to search images—be it in a database or on a network. Sometimes, a given user may be looking for a specific image. However, in many circumstances, it may be desirable to locate near identical images—these are the images that are nearly duplicate of each other.
For example, it may be desirable to locate images of the same landscape item taken from different vantage point. This is generally known as a “text to image” search. Several scenarios are possible. Just as an example, one user may be specifically looking for pictures of the Niagara Falls. It is known for the user to access a search engine, such as for example the Yandex™ search engine located at www.yandex.ru (or any other commercially available or proprietary search engine) to type in her search query—“Niagara Falls”. Responsive to the search query (and depending on the particular implementation of the search engine), the search engine may return a set of drawings matching the search query (typically, if the search engine is implemented as a vertical search engine or if the user executes a vertical search within a general search engine) or a mix of image results and other web resources, both responsive to the user search query.
Alternatively, the user may have an image and may be desirous of either finding similar images or determining what the image in her possession actually depicts. This is generally referred to as “image-to-image” or “search by image” searching. Also, those of skill in the art refer to this process as “content-based image retrieval” process. For example, the user may have an image depicting a waterfall and may not be aware which waterfall the image actually depicts. The user may be desirous of executing a search, whereby the image in her possession, in effect, is used as a search string.
A typical image-based search is further challenged by the fact that one needs to deal with a repository of images that contains many various images. Within today's implementations of image repositories, it is conceivable, that a given image repository may contain hundreds of thousands of images or even more. For example, it is estimated that there are well over a billion images available within various web resources on the Internet. A known approach to large scale image retrieval has been based on simple text-retrieval systems using the analogy of “visual words”. In other words, the known approaches to searching images are based on a so-called bag-of-visual-word (BoW) representation of images.
Pursuant to the typical BoW approach, images are scanned for “salient” regions and a high-dimensional descriptor is computed for each region. These descriptors are then quantized. A visual vocabulary is used to transform the continuous feature space into a discrete word space. This step typically consists of learning a vector quantizer, typically by k-means clustering, and using it to map the descriptors into visual words (forming a visual vocabulary). Typically, descriptors are quantized by finding their nearest centroid. An image is then represented as a bag of visual words, and these are entered into an index for later querying and retrieval. The spatial information is usually reintroduced as a post-processing step to re-rank the retrieved images, through a spatial verification like RANSAC.
Image querying is typically accomplished in two steps: searching and post-processing. During the searching step, similar images are retrieved from the large database and an initial ranking is generated. The most popular approach is to index images with inverted files to facilitate fast access to the images with common visual words. The post-processing step provides a more precise ranking of the retrieved images, usually through spatial verification.