Many search engine services, such as Google and Overture, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling” the web (i.e., the World Wide Web) to extract the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be extracted using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service may calculate a relevance score that indicates how relevant each web page is to the search request based on closeness of each match, web page popularity (e.g., Google's PageRank), and so on. The search engine service then displays to the user the links to those web pages in an order that is based on their relevance. Search engines may more generally provide searching for information in any collection of documents. For example, the collections of documents could include all U.S. patents, all federal court opinions, all archived documents of a company, and so on.
The highest ranking web pages of a search result provided by a web-based search engine service may be all directed to the same popular topic. For example, if a user submits a search request with the search term “Spielberg,” then the highest ranking web pages of the search result would likely be related to Steven Spielberg. If the user, however, was not interested in Steven Spielberg, but was instead interested in locating a home page for a mathematics professor with the same last name, then the ranking of the web pages would not be helpful to the user. Although the professor's home page may be included in the search result, the user may need to review several pages of links to the web pages of the search result to locate the link to the professor's home page. In general, it may be difficult for users to locate a desired document when it is not identified on the first page of a search result. Moreover, users can become frustrated when they have to page through multiple pages of a search result to find a document of interest.
It would be desirable to have a technique for ranking documents that would provide a greater diversity of topics within the highest ranking documents, and it would be further desirable to have each of such highest ranking documents be very rich in information content relating to its topic.