The present invention relates to a method for automatic translation between natural languages, and more particularly to a method for automatically translating English sentences into Japanese sentences.
The present method is applicable not only to the translation of English-into-Japanese but also to the translation between any different natural languages and further applicable to the translation between different representations of the same language, for example, the translation of a sentence written in Kana characters into a sentence written in Kana and Kanji characters. In the following description, for the sake of convenience, it is assumed that the input language is English and the output language is Japanese, although the present method is not limited thereto.
A method for automatically translating a sentence expressed in one natural language to a sentence expressed in other natural language is disclosed in the Journal of Institute of Electrical Engineering and Communication of Japan, Vol. 46, No. 11, pp. 1730-1739.
The prior art method disclosed therein is briefly explained below. When an English text is inputted, a lexicon is reference to convert a sentence consisting of a sequence of words to a string of parts of speech. In many cases, however, the part of speech of a word is not uniquely determined. For example, a word "study" is used either as a noun or as a verb. In such a case, words which allow the unique determination of the part of speech are first selected, and the part of speech of the words, before and after the word having a unique part of speech are determined next. For a word whose part of speech cannot be eventually determined, candidates of possible parts of speech are registered and one of the parts of speech in the candidates is tentatively selected. A string of parts of speech which is equal to a preregistered registered pattern of a string of parts of speech for a phrase or a clause is searched. If an equal one is found, the phrase or a clause is searched. If an equal one is found, the phrase or clause is replaced by a part of speech symbol.
For example, for a given sentence of ". . . a pulse of known rate of rise" as shown in FIG. 1, "a pulse" and "known rate" are determined to be noun phrases (NP), respectively. Then, "of+ rise"is replaced by an adjective phrase (AP), "known rate +of + rise" is replaced by a noun phrase and "of+known rate+of+rise" is replaced by an adjective phrase. In this manner, one sentence is converted to a simple pattern of a string of parts of speech. The converted pattern of a string of parts of speech is compared with a preregistered standard pattern of a string of parts of speech which constitutes a sentence. If they are equal, it is determined that the pattern is translatable and the sequence of words is transformed in accordance with a predetermined rule.
If the converted pattern of a string of parts of speech is not equal to the registered standard pattern, it is determined that the sequence of the parts of speech of the words is not appropriate and the part of speech of the multiple parts of speech word is replaced by other one of the registered candidates and the above process is repeated. Thus, when a word has multiple parts of speech, a feedback loop is used to determine the part of speech of the word.
Finally, Japanese equivalents are assigned to English words arranged in a transformed sequence so that a Japanese version of the English text is produced.
However, the prior art automatic translation method described above has many problems as discussed below.
Firstly, the determination of the multiple parts of speech word is very complex and hence the chance of successful translation tends to be low. In the prior art method, when one word has multiple parts of speech, one of the parts of speech is tentatively selected and the structure of the sentence is analized using a pattern dictionary, and if this does not succeed, another part of speech is selected and the process is repeated. In actuality, however, there are many words which have multiple parts of speech, and when a sentence is complicated, the number of possible strings of parts of speech for a sentence amount to a huge number. The need to repeat the same process for those strings of parts of speech many times results in a reduction of the translation speed. In addition, if a wrong part of speech is tentatively selected and the resulting string of parts of speech happens to be equal to the pattern registered in the dictionary, a wrong translation processing will be carried out.
Accordingly, the more complex the sentence is, the lower is the chance of correct translation or the chance of success.
Secondly, even if the part of speech of the word is correctly determined, mistranslation may occur because phrases are sequentially segmented from the beginning or the end of the sentence when the phrases or the clauses are replaced by parts of speech, without analyzing the relationship between phrases/clauses and words which they modify or relate to, that is, the dependency and modifying relation. For example, when the sentence ". . . take a bus in a city" shown in FIG. lB is analyzed in the same manner as shown in FIG. 1A, "a bus in a city" is recognized as one noun phrase, as a result, the translated sentence will mean "take a [bus in a city]". This mistranlation is caused because "in a city" which is an adverbial phrase to modify "take" is recognized as an adjective phrase to modify "a bus" when the phrases are segmented in sequence from the end of the sentence. In this manner, when the phrases are segmented in sequence from the beginning or the end of the sentence, the part of speech of the phrase may not be uniquely determined. If a sentence has a hierarchical structure, that is, if the sentence includes complex modifying words, phrases or clauses, it cannot be translated correctly.
Thirdly, in the prior art system, an entire program of a processing system must be modified when the kinds of sentences to be translated are to be increased. Thus, once the system has been completed, it is very difficult to increase the kinds of sentences. In the prior art system, feedback loops are used to determine the part of speech of the multiple parts of speech word and a structure analysis routine by the pattern dictionary is included in the feedback loops. Accordingly, when the registered patterns of the pattern dictionary are to be added or modified, the processing algorithm must be modified so as not to cause a discrepancy in the overall operation of the feedback loops. Normally, the probability of success in the automatic translation largely depends on the structure of the registered patterns for syntactic analysis. Accordingly, the addition and the modification of the registered patterns must be effected in a trial and error method. Thus, it is a big burden to modify the processing algorithm for each addition or modification.