Technical Field
This application generally relates to computer recognition of named entities in the text of a document.
Background
Natural language processing includes information extraction (IE) as a computer based task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In Information extraction, named entities are real world objects, such as organizations, persons, locations, and products that can be denoted with a proper name. Named entity recognition (NER) is that part of information extraction which uses named entity extractors to locate and classify named entities in text into pre-defined categories, such as organizations, the names of persons, locations, expressions of times, quantities, monetary values, and percentages.
At present, there exist conventional techniques related to named entity extractors which seek to provide a better result set. Conventional techniques address a set of data annotated by one NER model and address combing multiple NER models. Techniques related to combing multiple NER models tend to focus on how to use multiple named entity recognition techniques to have better precision/recall in order to identity various entities.
For a given a set of data annotated by one NER model, conventional techniques look to determine how to resolve which entities reference the same entity (i.e., that Bush and George Bush both reference the same person). In other words, given a set of extracted named entities from a document and comments, find all references to the same entity. In this regard, English Wikipedia policy on article titles provides standards for naming article titles in a recognizable, concise, and natural way that is precise and consistent and provides redirects to article titles that are less than this. For a set of data annotated by one NER model, some conventional techniques are dependent on a Wikipedia reference to find which variation of the entity name is the most accurate.
Conventional techniques also look to combine multiple NER models to obtain a more accurately annotated results. For example, given a token in a document (such as a “Party To Contract”), conventional techniques try to solve whether a given name entity is a desired name entity to extract by combining multiple NER models. However, this requires outputting all references to the “Party To Contract” rather than attempting to output a single answer such as where a user desires to determine a particular “Party To Contract” in a contract.