Many search engine services, such as Google and Overture, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request (also referred to as a “query”) that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, a search engine service may maintain a mapping of keywords to web pages. The search engine service may generate this mapping by “crawling” the web (i.e., the World Wide Web) to extract the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages and identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be extracted using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service may calculate a relevance score that indicates how relevant each web page is to the search request based on the closeness of each match, web page popularity (e.g., Google's PageRank), and so on. The search engine service then displays to the user the links to those web pages in an order that is based on their relevance. Search engines may more generally provide searching for information in any collection of documents. For example, the collections of documents could include all U.S. patents, all federal court opinions, all archived documents of a company, and so on.
In many instances, when a user searches using a search engine, the user is not interested in the fact that certain documents happen to match the query; rather, the user is interested in information about the people who are related to the retrieved documents. For example, the user may be interested in determining the most prominent or important authors in a certain field, such as biotechnology or computer science. To make that determination, the user may submit a query using a description of that field. When the results are provided to the user, the user can then peruse the documents and try to assess which authors seem to be prominent in that field. As another example, a user may be interested in determining which authors tend to collaborate with a given author. To determine the authors, the user may submit a query using the given author's name. When the results are provided to the user, the user can then peruse the documents and try to assess which authors co-authored articles with the given author. A difficulty with the use of a general search engine for identifying information about people is that it can be very time-consuming and difficult to manually identify the needed information from the documents reported as the search results and how they relate to each other in meaningful ways. Textual results are useful, but numerical summaries of the strength of the connection between results is equally important.
Specialized search engines have been developed to search for information about people. These specialized search engines, however, are based on central databases that are maintained manually. A difficulty with specialized search engines is that they have low coverage, low update rate, and limited information and are rarely peer reviewed which increases the value of the information. The coverage is low in the sense that only a very small portion of people are represented in the databases. The update rate is low because the databases are maintained manually and it would be too costly to update them frequently. The information is limited because the databases only contain basic information, such as phone numbers and home addresses, and interpersonal relationships, such as those based on co-authorship, are not represented in the database.
It would be desirable to have a technique that would automatically derive information about people who have relationships.