1. Field of the Invention
The present invention relates to a method of processing cooccurrence of words in a first language and of equivalents of the words in a second language. More particularly, it relates to a special cooccurrence processing method for processing a special type of cooccurrence wherein two or more words in the first language, having dependency relation between them, when cooccurring, would be translated into special equivalents in the second language; or a special type of cooccurrence wherein two or more cooccurring words in the first language having dependency relation between them, when translated into the second language, would cause a cooccurrence of their equivalents in the second language to be generated. The invention also relates to a system using the special cooccurrence processing method such as electronic dictionaries, machine translation systems, and information retrieval systems.
2. Description of the Prior Art
Conventionally, there have been available electronic dictionaries (hereinafter, referred to simply as dictionaries) which store various types of information such as entry words, parts of speech of the entry words, morphological information, equivalent words, parts of speech of the equivalent words, and information on case elements. There have been also information retrieval systems which have such dictionaries or similar information storage and which, upon receipt of a particular key word, output information corresponding to the key word. Also, there have been machine translation systems which have such dictionaries or similar information storage and which, upon input of a first language, convert the first language into a second language and output the conversion result.
In the information retrieval systems or machine translation systems as described above, the following problem may occur. In converting a first language into a second language, if proper consideration is not paid to the cooccurrence between words in the first language, a word of the highest frequency of use may simply be adopted as its equivalent in the second language even though the word in the first language has a plurality of equivalents in the second language. The resulting information may be meaningless or unnatural.
Thus, in the information retrieval systems or machine translation systems as described above, the following cooccurrence processing methods are generally used to process the cooccurrence among words:
(1) For dealing with semantic cooccurrence of words, making use of the "semantic attributes" refers to semantic concepts of various words being systematically classified. A "semantic attribute of a cooccurring word" is specified as information with the counterpart of a particular pair of entry and equivalent words registered in the dictionary or other similar information storage; PA0 (2) For dealing with the cooccurrence relation between one particular word and another, a series of cooccurring words is registered in the dictionary or other similar information storage as a "composite entry word"; and PA0 (3) For dealing with the cooccurrence relation between one particular word and another, an "entry word of a cooccurring word" is specified as information with the counterpart of a particular pair of entry and equivalent words registered in the dictionary or other similar information storage. PA0 (a) In the method of paragraph (1) utilizing "semantic attributes", there is a great difficulty in constructing a semantic concept system that allows semantic concepts of all the words to be perfectly classified without inconsistency. Moreover, since the individual "semantic attributes" are assigned by a human, the way the semantic concepts are grasped may vary depending on the person who constructs the system. PA0 (b) In the method of paragraph (2) involving registration of the "composite entry words," it considerably burdens the worker when all possible combinations of the words to cooccur with each other in the first language are treated as the "composite entry words" individually and registered with information such as equivalent words imparted thereto. Also, maintenance of the information is difficult, and moreover computation resources may be spent in vain. PA0 (c) In the method of paragraph (3) in which the "entry word of a cooccurring word" is specified as information on the counterpart of a specific word, only specifying the "entry of a cooccurring word" as the information on the counterpart of a specific pair of entry and equivalent words could not ensure the obtainment of an adequate equivalent word in the second language of the word cooccurring with the specific word in the first language. Accordingly, the method is insufficient for processing the cooccurrence relation.
However, the above conventional cooccurrence processing methods have the following problems:
Accordingly, although the semantic cooccurrence processing using the "semantic attributes" has indeed a substantial effect when generally processing common semantic cooccurrences, it is still insufficient for dealing with such special cooccurrences of words as defined above without adversely affecting other cooccurrence relations relating to the words.
Further, treating a set of words to cooccur as a "composite entry word" causes the words to be fixed, so that the cooccurrence processing will not be successful when some modifier is placed between cooccurring words. Otherwise, in the cases of "composite entry words" which are originally verbal phrases, the expected result could not be obtained if the voice is changed.