Text in newspaper articles, on Web pages and/or the like contains large numbers of characteristic expressions having a meaning (hereinafter also called a “class”) such as people's names, place names, organization names and/or the like. By recognizing these characteristic expressions from within the text, it is possible to effectively utilize text data in question answering systems, document classification, machine translation and/or the like.
An example of extracting characteristic expressions from text is disclosed in Non-Patent Literature 1. The method disclosed in Non-Patent Literature 1 creates in advance text data having an annotation (tag) appended as solution data to the classes of characteristic expression to be extracted. Furthermore, the method disclosed in Non-Patent Literature 1 accomplishes machine learning using an SVM (Support Vector Machine) from the solution data created in advance and generates rules for extracting the characteristic expressions. By using the generated extraction rules, it is possible to extract a class of characteristic expressions from arbitrary text.
If a class of phrases is the same, the art disclosed in Non-Patent Literature 1 generates extraction rules for each class by accomplishing machine learning on the premise that text information surrounding those phrases will be similar. The surroundings of a phrase includes words in text having a prescribed relationship with that phrase, such as words positioned before or after that phrase, and also includes words showing the type of text. The text information is information showing the character string, part of speech, connection and/or the like.