The present invention relates to named entity recognition. More specifically, the present invention relates to a system for recognizing named entities from multiple applications.
A named entity (NE) is a specific linguistic item, such as a proper name, the name of a company, an email address, etc., which is treated as one unit by an application. Named entity recognizers are known, and named entity processing is known to be an important stage of linguistic analysis.
NE recognition is currently done in a number of ways. Some approaches for NE recognition use list lookup when NEs are fixed (or static) such as city names, country names, first names, company names, fixed terms like product names, etc. Other approaches use regular expressions and grammar rules that can combine syntactic information with a lexicon or list lookup in order to recognize NEs. Most common approaches build finite-state recognizers directly from training data.
However, a number of problems currently exist with linguistic analysis systems that attempt to recognize named entities for a variety of applications. Applications constitute specific and specialized domains. Therefore, the named entities to be recognized by each will vary, depending on the application. If a recognizer is to be used with each application, then various recognizers must be taken into account within the same linguistic analysis layer where a textual input string may include named entities from several different applications.
In addition, some applications require the ability to modify their lists of NEs and update NE terms constantly. Some such applications even require the ability to update NE lists between the linguistic analysis of two sentences. Typical examples of such applications include those that maintain a list of names that can be edited by the user regularly. For example, for an application that handles file names (i.e., where file names are NEs), NE lists are updated when files are deleted, renamed or created. A given file name can thus be an NE when processing a first sentence but may not be an NE (if the file has been deleted or renamed) when processing a next subsequent sentence.