The advent of the semantic web promises a world where documents are interchangeable, such that regardless of where the documents are written they can be correctly interpreted by an automated process. This requires unique, permanent and shared entity identifiers, or uniform resource identifiers (URIs), and high quality disambiguation. A URI is generally a string of characters used to identify or name a resource on the Internet. A resource can refer to a document, file, or virtually anything (e.g., person, place, thing, etc.) that can be identified, named, addressed or handled in the World Wide Web (WWW), or in a networked information system. Such identification enables interaction with representations of the resource over a network, typically the WWW, using specific protocols. URIs are generally defined in schemes specifying a specific syntax and associated protocols.
Current text analytic tools, such as Open Calais by Thomson Reuters (www.opencalais.com), can with some degree of confidence indicate, for example, that “Mike Smith” is a name. Current text analytic tools can also, to some extent, make assumptions from the context of the text and propose other properties of “Mike Smith” (such as address, occupation, etc.). Current text analytic tools can also make similar assumptions and propose other properties about other entities, such as companies, and location entities such as cities, states and countries.
However, one shortcoming of known text analytic tools is that they cannot by themselves confirm the true identity of “Mike Smith.” Moreover, current text analytic tools do not allow the user to be involved in or participate in the selection of the “Mike Smith” entity that is intended by the user. That is, if there are a plurality of “Mike Smith” entities stored in a database utilized by the text analytic tool, the text analytic tool cannot be entirely sure that it is selecting the correct “Mike Smith” (i.e., the “Mike Smith” intended by the author of the document). Embodiments of the present invention are directed to overcome these, and other known limitations associated with text editing tools.