The present invention relates to a natural language processing technology for a machine translation system and the like, and in particular, to a method of and an apparatus for automatically generating and/or updating a cooccurrence relation dictionary for a source language.
In a natural language processing system, cooccurrence of words defines two words, primarily, a verb and a noun which are combined with each other through a particular case relation such as an agent and an object and appear in a sentence at the same time. For two words, if the cooccurrence takes place, it is defined that these two words have a cooccurrence relation therebetween. Utilization of knowledge and information about the cooccurrence relation is quite effective to improve the quality of results of processing conducted in the natural language processing system such as a machine translation system and a word processor. Knowledge about the cooccurrence relation is ordinarily supplied as a cooccurrence relation dictionary to the system. The cooccurrence relation dictionary is provided to store therein combinations each including words associated with each other through a cooccurrence relation. Heretofore, the cooccurrence relation dictionary is prepared through a manual operation; furthermore, the updating operations such as addition, deletion, and revision thereof are carried out by use of a human power.
There exist a great amount of combinations of words having the cooccurrence relations therebetween, and in addition, a necessary range of the knowledge of cooccurrence relations is also difficult to decide. Moreover, the required range varies depending on a field related to contents of words to be primarily processed by each system. Consequently, it is necessary to efficiently collect a large volume of the knowledge of cooccurrence relations and further to conduct, after the system is established, an appropriate updating on the collected knowledge. The collection and updating manually accomplished in the conventional system are quite ineffective, and it is difficult to appropriately cope with the discrepancies among the knowledge items having the cooccurrence relations due to difference in the respective fields. The generating/updating technology of the cooccurrence relation dictionary of the natural language has been described in the U.S. application Ser. No. 922,889 filed on Oct. 24, 1986 now U.S. Pat. No. 4,942,564 and assigned to the present assignee, which is hence incorporated in the disclosure of the present invention by reference thereof.