The present invention generally relates to a multiple-parts-of-speech disambiguating system. More particularly, the invention concerns a processing method and apparatus for parsing a sentence on the basis of determination of parts of speech which words capable of functioning as multiple parts of speech should be in the sentence, by applying disambiguating rules while taking into consideration the rates or frequencies at which the parts of speech a word can function as make appearance.
For realizing a machine translation system for practical applications, it is necessary to prepare an abundance of dictionaries to be usable so that the processing for translation can be performed in the environment appoximating to the actual situation where translation is humanly made by consulting dictionaries. In the processing of sentences written in a natural language, the sentence composed of words which can function as multiple parts of speech (also referred to as multiple-parts-of-speech word) is first subjected to parsing or syntax analysis. As an example of such multiple-parts-of-speech words, there may be mentioned a word which can function selectively as a noun or a verb. The part of speech which a word should be in a given sentence has heretofore been determined by checking the parts of speech of the words preceding and succeeding to the given word and applying multiple-part-of-speech disambiguating rules prepared previously. A typical example of the processing in accordance with the multiple-parts-of-speech disambiguating rules is disclosed in Japanese Patent Unexamined publication No. 56-138586. The part-of-speech disambiguating rules are usually prepared on the basis of grammatical contraints and statistical probability. A variety of rules have heretofore been proposed. Among them, there may be mentioned the rules disclosed in Takeshi Kiyono et al's article titled "Machine Translation", Periodical of the Institute of Electronics and Communication Engineers of Japan, Vol. 46, No. 11 (November 1963).
However, when a dictionary abundant in content is used, there often arises such a situation in which the multiple-parts-of-speech words make appearance successively in a sentence, making it difficult to disambiguate deterministically the parts of speech which the words function as in the sentence with the hitherto known processing system based on the array of the parts of speech. Further, when the multiple-parts-of-speech disambiguating rules have been fixedly established, there may happen such a case where the part of speech which a word will scarcely function as in a certain sphere of literature will be selected. Under the circumstance, restriction of the range of sentences to be translated can not effectively contribute to the deterministic disambiguation of the parts of speech. At the present state of technology, it is very difficult to perform the syntax analysis and translation processing with a reasonable accuracy.