Naming of entity classification is a crucial step of many applications. A named entity is essentially a word with a semantic meaning. For example, in an automated “Question” and “Answer” system, it is necessary to determine whether the type of a candidate answer conforms to the type as specified by a question. In an information extraction system, it is necessary to identify the type of a named entity so as to prepare for a subsequent extraction processing.
A traditional automatic classification system is based on machine learning. Specifically, a series of named entities of a known type are entered into an automatic classification system, with each named entity corresponding to a feature vector. The automatic classification system obtains a correspondence relationship between the feature vector and the type through machine learning. When the automatic classification system receives a to-be-classified named entity and its feature vector, it may classify the to-be-classified named entity based on the correspondence relationship.
For example, word-level information and its context information of the named entity itself may be used as elements of the feature vector. In this case, the feature vector of the named entity is a two dimensional vector. For a named entity “Smith”, the word-level information is, for example, that the initial letter of the named entity is in capital, and the context information is, for example, that a word preceding this named entity is “Professor”. The feature vector of this named entity is (initial letter in capital, following “Professor”). If the automatic classification system has mapped this feature vector to a type “person,” then the named entity may be classified into the type of “person,” instead of the type of “organization.” The skilled in the art may understand that the feature vector and the type might not have a one-to-one correspondence relationship.
The prior art methods such as the one discussed above has many shortcomings. For example, such a method requires manual determination of appropriate types for a large amount of named entities to generate the training set, which will bring heavy workload. The development of Internet has presented even more challenges to the already existing problems in this are. Through the use of the internet, more and more information can be obtained on the web, and such information may help named entity classification.
Consequently, it is desirable to provide a solution that can help automatic classification of named entities that has some of the required information which provides currently the implementation of which provides difficult challenges for the prior art.