Many search engine services, such as Google and Yahoo, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request or query that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by crawling and indexing the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service then ranks the web pages of the search result based on the closeness of each match, web page popularity (e.g., Google's PageRank), and so on. The search engine service may also generate a relevance score to indicate how relevant the information of the web page may be to the search request. The search engine service then displays to the user links to those web pages in an order that is based on their rankings.
These search engine services may, however, not be particularly useful in certain situations. In particular, it can difficult to formulate a suitable search request that effectively describes the needed information. For example, if a person sees a flower on the side of a road and wants to learn the identity of the flower, the person when returning home may formulate the search request of “picture of yellow tulip-like flower in Europe” (e.g., yellow tulip) in hopes of seeing a picture of the flower. Unfortunately, the search result may identify so many web pages that it may be virtually impossible for the person to locate the correct picture, even assuming that the person can accurately remember the details of the flower. If the person has a mobile device, such as a personal digital assistant (“PDA”) or cell phone, the person may be able to submit the search request while at the side of the road. Such mobile devices, however, have limited input and output capabilities, which make it difficult both to enter the search request and to view the search result.
If the person, however, is able to take a picture of the flower, the person may then be able to use a Content Based Image Retrieval (“CBIR”) system to find a similar-looking picture. Although the detection of duplicate images can be achieved when the image database of the CBIR system happens to contain a duplicate image, the image database will not contain a duplicate of the picture of the flower at the side of the road. If a duplicate image is not in the database, it can be prohibitively expensive computationally, if even possible, to find a “matching” image. For example, if the image database contains an image of a field of yellow tulips and the picture contains only a single tulip, then the CBIR system may not recognize the images as matching.
Searching for similar images, or more generally objects (e.g., still images, video images, and audio), has many useful applications. One application, as described above, is to find web pages that may relate to the content of an image. A search engine may input a search request that includes text and an image. The search engine may locate web pages that contain textual content that is similar to the text of the search request and an image that is similar to the image of the search request. Another application of finding similar objects is to help enforce intellectual property rights (e.g., copyrights). Such an application can help find pirated versions of pictures, movies, music, and so on. A copyright owner may build a database of copyrighted objects. When a suspect object is found (e.g., on a web page), it can be compared to the objects in the database to determine whether it is similar to a copyrighted object. If so, then a copyright violation may have occurred. If a copyright owner (or enforcer of copyrights on behalf of owners) has millions of copyrighted objects (e.g., a collection of still images or frames of videos), it can be computationally very expensive to search the database.