Conventional web-based search engines make it possible to quickly search enormous quantities of online documents for those that are most relevant by simply doing a keyword search. Keyword searching is effective for large-scale systems due in large part to the document indexing (sometimes referred to as web indexing) that is performed ahead of processing an actual search. In general, document indexing, or web indexing, involves collecting, parsing, and storing data in a particular format to facilitate fast and accurate information retrieval. When a user performs a keyword search to find relevant documents, the search engine utilizes an index to very efficiently identify the set of documents that contain all or part of the user-provided keywords. These documents can then be ranked using various algorithms and presented to the user as search results.
While document indexing is effective for document searching, when it comes to image searching, and in particular, searching a large corpus of images for those images that are similar to a reference or target image (e.g., an image similarity search), indexing by itself may not always provide the best user experience and/or yield acceptable or good results. This is due at least in part to the difference between finding exact matches, as is done with keyword searching when a particular word is identified as being included within a document, and finding attributes that may not be an exact match but are near matches (e.g., similar). If too many image attributes are used to determine similarity, the search engine will become less efficient as the number of images increases, and thus lack scalability. In addition, a particular search may yield too few results to be useful to the user because too few of the specific image attributes or factors will exactly match. However, if too few image attributes are used to determine similarity, then the relevance of the search results will suffer, and once again, the search results will not be useful to the user.