Computational techniques such as information extraction are used to automatically identify and extract information in the form of facts. Information extraction can be performed on a variety of sources such as web pages to extract fact data. A set of facts collected from a source describing the same entity may be organized and can be stored as an object in a repository of facts.
Facts contain an attribute that describes an entity. The entity type of the entity is the type of real world thing the object represents (e.g. Person, Dog, Book, Movie). Entity type can be defined in a fact associated with the object. This entity type information is necessary to understand relationships between sets of facts associated with different objects. For example, an object with entity name “Hillary Clinton” is associated with a fact, “spouse Bill Clinton” and it is known that the attribute “spouse” always has a value with entity type “Person”. Knowing that an object with entity name “Bill Clinton” has an entity type of “Person” enables the identification of a relationship between the two objects.
These relationships can be used to organize the sets of facts. Similarly, in cases where it is unclear whether or not a fact is valid, entity type is used in association with the attribute defined by the fact to assign a confidence to the fact. For example, an object with entity name “Bill Clinton” has an attribute “genre” with an associated value “political”. If we know “genre” is only is used to describe a closed set of entity types such as “Book” and “Movie”, knowing that “Bill Clinton” is of entity type “Book” can provide a better confidence in that fact.
Often the entity type of the entity represented by the object is unknown or confounded for a number of reasons. For instance, entity type information may not be available in the source data. Due to inherent error in information extraction, entity type information may not be extracted. The entity type of an object may also be confounded by several objects having the same name. For example, an object with entity name “Bill Clinton” could be either an object with entity type “Person” or an object with entity type “Book”.
What is needed then is a computational method of assigning entity types to objects.