Field of the Invention
The present invention relates to a system which can generate thesauruses as required by various operations to meet users' purposes.
Conventionally, research has been actively done on text retrieval and automatic classification as part of natural language processing techniques. In such fields, there is increasing importance of a so-called knowledge base, particularly one in which relations among concepts called a thesaurus are hierarchically defined primarily based on rank relations.
The flow of knowledge processing having been proposed conventionally is broadly divided into a rule-based approach and a example-based approach.
The rule-based approach performs knowledge processing by mapping the real world into a combination of defined rules. In a production system, which is a typical rule-based system, problems are solved by an inference engine referencing a rule base. This approach is characterized by the modularity and uniformity of rules, and natural knowledge representation. To the contrary, this approach has the problem of the difficulty of the creation and maintenance of rules because of reduction in efficiency due to increased rules as a result of treatment for exceptions and the intrinsic difficulty of creation of rules.
On the other hand, the example-based approach constructs a knowledge base based on expressions and cases that actually exist.
Methods for the example-based approach are further broadly divided into a general-purpose method and a specific-use method.
The general-purpose method intends to describe world knowledge based on actual examples, as typified by the dictionary construction approach of Japan Electronic Dictionary Research Institute (EDR). This method has the advantage of enabling non-specialists to construct a knowledge base having high volumes of knowledge by providing manpower because it does not depend on specific syntax rules for a method of dictionary description, etc., ensuring uniformity and accuracy for concepts having actual examples like a rule base. However, the fact that each concept generally has polysemy would provide no guarantee for accuracy and make detailed reflection of reality difficult without the same situation as at actual use of an actual example. Also, since the knowledge base becomes a very large one that contains hundreds of thousands of concepts or more, there is a problematic point that individual pieces of data are easy to input but it is very difficult to perform maintenance on the entire knowledge base.
As inventions relating to such thesaurus construction and generation, there are proposed "Knowledge Structure Creating Method" described in Japanese Published Unexamined Patent Application No. Hei 4-237332, "Thesaurus Generating Device" described in Japanese Published Unexamined Patent Application No. Hei 4-39769, and "Data Classification Device/Method, Data Classification Tree Generating Device/Method, Derivative Extracting Device/Method, Thesaurus Constructing Device/Method, and Data Processing Device" described in Japanese Published Unexamined Patent Application No. Hei 8-16620.
However, all of these inventions merely automate construction by introducing analysis techniques and only enjoy the merits of the example-base approach, providing no solution for the above-mentioned problem.
"Thesaurus Automatic Reorganization Device" described in Japanese Published Unexamined Patent Application No. Hei 3-276369, which describes the reconstruction of thesaurus, assumes thesauruses intended for specific uses as described below, such as program parts and electronic parts, providing no solution for the problem of polysemy in a general-purpose large-scale thesaurus.
A method for a specific use constructs and uses a small-scale knowledge base to meet the use. This method, although it has the advantage of providing easy maintenance while reflecting reality because of the specific-use characteristics, has a problem in that application to another use or reuse after several modifications requires as much time and effort as constructing a knowledge base from the beginning because situations eligible for use are limited and its existence is unknown to others.
As an invention relating to this use, there is proposed "Field-Classified Thesaurus Generating Device" described in Japanese Published Unexamined Patent Application No. Hei 9-6789. The invention relates to a technique which constructs field-classified thesauruses with inadequate words not relating to specific fields being removed based on general-purpose thesauruses from queries defining specific fields. However, the invention provides no solution for the problem of the difficulty of construction of general-purpose thesauruses, and further provides no effective means for reuse.
As described above, it has been so far difficult to construct a new thesaurus to meet user's purposes and uses from existing thesauruses.