The present invention is related to the field of information processing, and more specifically, to a method and system for disambiguation in mention detection.
Mention detection is a kind of method for processing text information. Mention detection is used for automatically detecting mentions of various entities such as name of a person, name of a place, an organization and the like, and mapping these mentions to resources associated with the entities. As an example, if a mention detection tool detects a mention of a name of person “Michael Jordan” in a text, the mention can be linked to a web page regarding “Michael Jordan” in a web dictionary as an example. During use, for example, when a user places a cursor on or near such a mention, a uniform resource indicator (URI) of a corresponding resource can be displayed to the user.
An important step in mention detection is disambiguation. Mentions for a same entity can have different surface forms. For example, mentions of the entity “Michael Jordan” can be “Jordan,” “Michael,” “Air Jordan,” “MJ,” and the like. In the meantime, mentions of different entities can have a same surface form. For example, the surface form for the mention of “Michael Jordan” can also be “MJ.” The objective of disambiguation operation is determining a mention of a given text should be mapped to a resource corresponding to which entity. For example, the surface form “MJ” should be linked to the resource of “Michael Jordan” or “Michael Jackson.”
In a traditional disambiguation algorithm, only a priori probability for a surface form belonging to a candidate probability and a context score are usually concerned. The context score is a score of similarity between words appearing around a surface form and words appearing around a mention of a candidate resource. However, in such traditional method, considerable useful information in the text is not sufficiently utilized. Therefore, the accuracy and effect of disambiguation needs to be improved.