The present invention relates to a dictionary memory for natural language processing, and more particularly to a dictionary memory capable of storing and updating degrees of association between words and information on word occurrence in a text.
Meaningful natural-language text is made up of semantically related words. As such, the words that co-occur in a text are restricted. It is well known that this fact can be utilized to improve text processing accuracy. For example, a method in which a machine translation system utilizes word co-occurrence information to select an appropriate word in a target language is disclosed in JP-A-63-132379. The same concept is used in homophone selection in Kana-to-Kanji conversion, word recognition in speech recognition, correction of spelling errors in a text and the like. Text processing utilizing word co-occurrence information is realized by preparing a co-occurrence dictionary in which are stored pairs of words that may co-occur and, when there is a plurality of candidates for a particular word, checking to see whether the co-occurrence dictionary contains a pair of words consisting of each of the candidates and a word occurring near the word. A problem with this type of conventional method is that the wide context examination requires checking of many pairs of words, which increases the processing time.
With respect to the utilization of word co-occurrence information, another problem concerns how to acquire the knowledge of word co-occurrence. In this regard, JP-A-2-42572, for example, discloses a method of analyzing the syntax of a sentence and registering, in a co-occurrence dictionary, pairs of words having a dependency relationship. This method permits the knowledge of word co-occurrence to be acquired from a text automatically. However, inasmuch as the pairs of words that can be acquired by this method are restricted to words between which the strong relationship of dependency obtains, there is the additional condition of syntactical unambiguity, giving rise to the drawback of low efficiency of the knowledge acquisition process.