The present invention relates to a maintenance support method and an apparatus for a natural language processing system. Particularly, the present invention relates to a maintenance support method and an apparatus for a natural language processing system for processing a sentence written in a natural language which is suitable for supporting maintenance of grammar rules and dictionaries that are used for the processing.
Generally, a natural language processing system is structured by a dictionary and grammar rules for a source language and a processing system which processes an input by applying the grammar rules.
Question answering systems using a natural language and machine translation systems have been studied and developed as a natural language processing system. For example, the following processing is performed in a machine translation system. A sentence of a source language is analyzed by using dictionary information relating to the words of the source language and grammar rules for analysis. Grammar rules for generation are applied to the result of analysis. A sentence (a translation) is generated by using dictionary information of the words of a target language. A translation processing of this type is discussed in the JOHO SHORI of the Information Processing Society of Japan, Vol. 26, No. 10 (1985), pp. 1174-1190.
Some machine translation systems perform processings which take account of semantic information in addition to the processing based on syntactic information, in order to improve the quality of translation. For example, there is a method in which information about co-occurrence relation is stored as dictionary information that shows possibilities of a certain word appearing in a same text with some relation to any other words, and this stored information is utilized to reduce any syntactic or semantic ambiguity that may occur in the course of processing of a natural language sentence.
However, when a sophisticated processing is realized by this method, the amount of information which is necessary for this processing becomes very huge and the information becomes very detailed. As a result, a person who prepares this information is required to have linguistic knowledge and take a heavy work load for this purpose. Accordingly, it has been necessary to employ a method to improve the quality of grammar rules and dictionary information with a small work load.
In order to meet the above purpose or the like, there are methods of surveying linguistic phenomena that appear in actual sentences. As one of those methods, there is one method for surveying with a list called a KWIC (Key Word In Context) list in which sentences including a specific word are listed after sorting sentences that include this specific word. Based on the surveyed result, maintenance of grammar rules and dictionary information are performed manually. A method of this type for analyzing sentences is discussed in the SIGNAL Notes 54-3 of the Information Processing Society of Japan, Mar. 28, 1986.
There are also attempts for reducing human work by (semi) automatically obtaining dictionary information including co-occurrence relation, etc., from actual texts. Transactions of the Information Processing Society of Japan, Vol. 26, No. 4, pp. 706-714 (1985), etc., is relevant to the extraction of this type of dictionary information.
There are also many publications of grammar books and dictionaries for men, and utilization of these publications is also considered. However, such information cannot be used for machine processing without encoding. Depending on the purposes of processing, it is necessary to interpret and encode or structure the information. Such information is based on the assumption that human beings who have world knowledge and experience other than linguistic knowledge, and linguistic phenomena are explained in fractional sentences only. Accordingly, it is necessary to supplement knowledge of missing parts and ambiguous parts in order to prepare a dictionary for machine processing. In actual documents, a variety of linguistic phenomena exist depending on the kinds of documents, and there is no other way than to survey the actual source documents to obtain these detailed phenomena.
The inventors of the present invention have already filed an application (JP-A-1-70871) in relation to the present invention. There is also JP-A-62-99865 as relevant art. These are not relevant to the substance of the present invention.