One example of the vocabulary classification technique is described in Non-patent document 1. The word extraction technique, being one of the vocabulary classification techniques, is a technique of classifying words into categories called named entities such as an organization name, a place name, a person's name, and a date. This technique can alleviate a rule preparing work that is done manually by inputting learning data in which the named entities have been, annotated to texts, and learning a word classification rule.
In the Non-patent document 1, the word classification rule is learned based upon context information in an adjacency of an occurrence position for each of individual words. The so-called adjacency signifies two words or so before and after the occurrence position, and the so-called context information is a word, a part of speech, and a character type. The word classification rule is learned on the basis of these pieces of information for each named entity category. That is, as a word classification rule, there exist the word classification rule for determining whether or not the classification is an organization name, the word classification rule for determining whether or not the classification is a place name, and the like. While the word classification rule, which is described as binary data of the learning technique called Support Vector Machines, is not information that a human being can recognize, conceptually, it is thinkable that with regard to the word classification rule of the organization name, the word rules such as “<organization name> holds meetings”, “<organization name> develops a system” are learned.
Further, the technology of the word classification related to the present invention is disclosed in Patent document 1. The technology disclosed in Patent document 1 is a technology of, for each category, previously preparing a core word, being a word representative of the above category, a core word dictionary having a plurality of sets of values stored therein indicative of a degree to which the above core word belongs to the above category, and a document database having documents stored therein, retrieving the classification target word from the stored documents of the document database, and furthermore extracting the word having a co-occurrence relation to the above word. And, this technology makes retrieval as to whether each of the extracted co-occurrence relation words is stored as a core word in the core word dictionary, forms a ranking determination value of the category from the values of the retrieved core words, and determines the category to which the classification target word belongs. Herein, the so-called core word is a word that is peculiar to the category, and is representative of the category. For example, with the category ┌art┘, there exist ┌movie┘, ┌music┘, ┌director┘, and the like, each of which is a typified word well expressive of ┌art┘, and yet a word associated with the above category.    Non-patent document 1: “Japanese Named Entity extraction Using Support Vector Machines” by Hiroyasu Yamada, Taku Kudo, and Yuji Matsumoto, Technical Research Report of the information Processing Society of Japan—Natural Language Process, Vol. 2001, No. 20, pp. 121-128    Patent document 1: JP-P2004-334766A