The specification relates to information retrieval, and particularly to name disambiguation.
The Internet provides access to a wide variety of resources, for example, video files, image files, audio files, or Web pages including content for particular subjects, book articles, or news articles. A search system can select one or more resources in response to receiving a search query. A search query is data that a user submits to a search engine to satisfy the user's informational needs. The search system selects and scores resources based on their relevance to the search query and on their importance relative to other resources to provide search results that link to the selected resources. The search results are typically ordered according to the scores.
A very popular search scenario is searching on person names. As most person names are not unique, an initial search on a person name can yield multiple search results that each reference resources describing different persons. For example, a search on the name of “John Smith” may yield search results that reference resources with information about an explorer, resources about a botanist and curator of Kew Gardens, resources about a professional wrestler, and still other resources about other people that are named “John Smith.” As search queries are often an incomplete expression of the information needed, the user will often revise the search query to focus in on search results. Such revisions including adding addition search terms to the name. For example, suppose the user is searching for information relating to the explorer John Smith's interactions with Chief Powhatan. The user may revise the query to read “John Smith Chief Powhatan.” The search query will cause a search engine to provide search results that reference documents that are more likely to satisfy the user's informational needs.
Often, however, users may not have enough information or background knowledge about a person to effectively revise queries. Thus, the user may have to revise a query multiple times before he or she finds information that satisfies his or her informational needs.