Very often people use the Internet to find out information about an entity, such as a person, a place, a company, or an event. A search for the information usually begins with a query request to a search engine, which results in a plurality of web documents. The search widens as web documents link to other web documents, and eventually a complex web of inter-related documents is discovered.
Thus a search for an entity of interest “A” first leads to a plurality of web documents, which relate A to other entities B, C, D, etc. These other entities in turn lead to another plurality of web documents. Eventually a network of entities, and associations between the entities, emerges. Such a network is referred to generically as a “social network”.
Generation of social networks often requires much manual work in order to piece together an accurate and complete network. It is of great advantage to automate the derivation of social networks. However, the success of manual derivation of social networks is based upon human inference and intuition, and many challenges arise when trying to automate the human processes.
One such challenge is discrimination between entities in different documents that have the same name. E.g., entities named “John Doe” may appear in two documents, and correspond to different people. Conversely, entities with different, but similar, names in two different documents may correspond to the same entity. E.g., entities named “John Q. Adams” and “John Quincy Adams” may correspond to the same person. Using inference and intuition, humans are able to perform the necessary discrimination. However, automated discrimination is a difficult task.