Named Entity Recognition (NER), also known as “proper name recognition”, refers to the recognition of an entity with a specific meaning in a text. The entity mainly includes a person name, a place name, an organization name and a proper name. Named entity recognition is an important basic tool in such fields as information extraction, question answering system, syntactic analysis, machine translation and Semantic Web-oriented metadata annotation, and plays an important role in the process of natural language processing technology becoming practical.
At present, named entity recognition is generally implemented by the following method. Specifically, the method includes the steps of constructing a named entity set or specifying an entity extraction rule; conducting word segmentation on sentences and constructing a dictionary tree or a rule tree; traversing the result of word segmentation, matching with a dictionary or a rule, if there is content matched with the dictionary or the rule, marking the position of the content, and if there is no matching content, traversing the next sentence of a text; and outputting a final annotation result till all the sentences of the text are traversed.
When implementing the named entity recognition method, the inventor finds that the current technical solution has at least the following problems: during the task of recognizing a proprietary named entity in the Chinese domain, Chinese words cannot be segmented by blank space like English words and the like, and incorrect word segmentation may lead to inaccurate determination of the boundary of the named entity, further resulting in inaccurate recognition of the named entity; moreover, the accuracy of current named entity recognition completely depends on the completeness of the dictionary or the rule, and the entity recognition task cannot be well fulfilled for a changing entity coverage.