The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Conventional natural language processing (NLP) systems treat documents as collections of keywords. In one approach, sequences of n (usually contiguous) words (n-grams) are identified, but it may be difficult to identify the relevant n-grams. An alternative to this n-gram based approach is an entity centric approach. Examples of entities include people, places, companies, events, and concepts. One problem with the entity centric approach, however, is determining the salience (e.g., the prominence) of each entity. The salience of a specific entity can be indicative of its prominence within the document, which is not to be confused with entity importance, which is outside the scope of the document, and entity relevance, which is subjective to the reader of the document. The salience of a specific entity, therefore, can be important for accurately parsing the document.