Many search engine services, such as Google and Overture, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling and indexing” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service then ranks the web pages of the search result based on the closeness of each match, web page popularity (e.g., Google's PageRank), and so on. The search engine service may also generate a relevance score to indicate how relevant the information of the web page may be to the search request. The search engine service then displays to the user links to those web pages in an order that is based on their rankings.
Although many web pages are graphically oriented in that they may contain many images, conventional search engine services typically search based on only the textual content of a web page. Some attempts have been made, however, to support image-based searching of web pages. For example, a user viewing a web page may want to identify other web pages that contain images related to an image on that web page. The image-based search techniques are typically either content-based or link-based and additionally use surrounding text to aid in analyzing images. The content-based techniques use low-level visual information for image indexing. Because the content-based search techniques are very computationally expensive, they are not practical for image searching on the web.
The link-based search techniques typically assume that images on the same web page are likely to be related and that images on web pages that are each linked to by the same web page are related. Unfortunately, these assumptions are incorrect in many situations primarily because a single web page may have content relating to many different topics. For example, a web page for a news web site may contain content relating to an international political event and content relating to a national sporting event. In such a case, it is unlikely that a picture of a sports team relating to the national sporting event is related to a web page linked to by the content relating to the international political event.
It would be desirable to have an image-based search technique that would not be computationally as expensive as conventional content-based search techniques and that, unlike conventional link-based search techniques, would account for the diverse topics that can occur on a single web page.