Large data graphs store data and rules that describe knowledge about the data in a form that provides for deductive reasoning. Data graphs often store entities, such as people, places, things, concepts, etc., as nodes. Relationships between entities may be the edges between nodes. The relationships and entities in the data graph can represent facts. For example, the entities “Maryland” and “United States” may be linked by the edges of “in country” and/or “has state.” Identifying entities mentioned in text is a key step in many language-processing tasks, such as text classification, information extraction, and grounded semantic extraction. It may also assist tasks such as part-of-speech tagging, parsing, and coreference resolution. But entity resolution can be challenging because the same text may refer to multiple entities. For example, Newcastle may refer to Newcastle upon Tyne, UK, to the football (soccer) club Newcastle United, or to the beverage Newcastle Brown Ale. Context may assist in disambiguating the referring text. For example, if the referring text includes the context of “John plays for Newcastle,” the mention is most likely the football club, while “John was born in Newcastle” most likely refers to the location, etc.
Models are often used in entity resolution. Models predict the probability of some event given observations. Machine learning algorithms can be used to train the parameters of the model. For example the model may store a set of features and a support score for each of a plurality of different entities. The support score represents a probability score the model has learned, a probability that the feature occurs given the entity. Models used in entity resolution have relied on three components: a mention model, a context model, and a coherency model. The mention model represents the prior belief that a particular phrase refers to a particular entity in the data graph. The context model infers the most likely entity for a mention given the textual context of the mention. In a context model, each feature can represent a phrase that is part of the context for the entity mention. For example, the phrase “president” may have a support score (or a probability score) for the entities of “Barack Obama,” “Bill Clinton,” “Nicolas Sarkozy,” and many others. Similarly, the phrase “plays for” may have a support score for various bands, teams, etc. The context discussed above may be represented by a set of features, or phrases, co-occurring with (e.g., occurring around) the referring text, or entity mention. The coherency model attempts to force all the referring expressions in a document to resolve to entities that are related to each other in the data graph. But a coherency model introduces dependencies between the resolutions of all the mentions in a document and requires that the relevant entity relationships in the data graph be available at inference time, increasing inference and model access costs.