The Internet provides access to a vast amount of information. A major challenge given the quantity of information is how to find and discover information to provide a user with the most relevant information for a particular circumstance. The most common tool for doing this today is a keyword based search query provided to a search engine. The search engine matches received keywords to one or more words or phrases in a search index to identify documents, web pages, or other content that is potentially relevant to the user's query. For example, if a user searches for “dinosaurs” then the search engine provides the user a list of search results that are links to web pages that contain that term.
User queries often contain one or more entities (e.g., a person, location, or organization name) identified by name or properties associated with the entity. For example, one query might search for “Barack Obama”, while another might search for “President of the United States”. Both of these queries are looking for information related to a specific entity. Users may also search for locations, such as restaurants, banks, shopping centers, and so forth. Entities may include any type of nameable thing whether it is a business, person, consumer good or service, and so forth.
Today, when users search for a named entity using a search engine, the search engine presents assorted results that may be about a mixture of different entities with the same or similar names. For example, for the query “harry shum”, one recent search engine returns pages about three different people in mixed order: positions 1, 3, 5, and 8 are about the Corporate Vice President at Microsoft's Online Services Division; positions 2, 4, 6, and 9 are about Harry Shum Jr., the American actor and dancer who plays Mike Chang in Glee; and position 7 is about yet another Harry who is a network support engineer at IP Systems. It is not actually clear from the user's query which of these people the user was trying to find, but it is likely that the user is only interested in one of them and that a large subset of the results are thus not relevant. The inability of search engines to resolve the underlying identities of entity instances in web pages hinders their ability to effectively organize search results.