An early approach for the ranking problem in EL has resolved the entity mentions in documents independently (the local approach), utilizing various discrete and hand-designed features/heuristics to measure the local mention-to-entity relatedness for ranking. These features are often specific to each entity mention and candidate entity, covering a wide range of linguistic and/or structured representations such as lexical and part-of-speech tags of context words, dependency paths, topical features, KB (Knowledge Base) infoboxes. Although the local approach can exploit a rich set of discrete structures for EL, its limitation is twofold:
(i) The independent ranking mechanism in the local approach overlooks the topical coherence among the target entities referred by the entity mentions within the same document. This is undesirable as the topical coherence has been shown to be effective for EL.
(ii) The local approach might suffer from the data sparseness issue of unseen words/features, the difficulty of calibrating, and the failure to induce the underlying similarity structures at high levels of abstraction for EL due to the extensive reliance on the hand-designed coarse features.
The first drawback of the local approach has been overcome by the global models in which all entity mentions (or a group of entity mentions) within a document are disambiguated simultaneously to obtain a coherent set of target entities. The central idea is that the referent entities of some mentions in a document might in turn introduce useful information to link other mentions in that document due to the semantic relatedness among them. For example, the appearances of “Manchester” and “Chelsea” as the football clubs in a document would make it more likely that the entity mention “Liverpool” in the same document is also a football club. Unfortunately, the coherent assumption of the global approach does not hold in some situations, necessitating the discrete features in the local approach as a mechanism to reduce the potential noise. Consequently, the global approach is still subject to the second limitation of data sparseness of the local approach due to their use of discrete features.
Recently, the surge of neural network (NN) models has presented an effective mechanism to mitigate the second limitation of the local approach. In such models, words are represented by the continuous representations and features for the entity mentions and candidate entities are automatically learnt from data. This essentially alleviates the data spareness problem of unseen words/features and extracting more effective features for EL in a given dataset.
In practice, the features automatically induced by NN are combined with the discrete features in the local approach to extend their coverage for EL. However, as the previous NN models for EL are local, they cannot capture the global interdependence among the target entities in the same document.