1. Field of the Invention
The present invention relates to a machine translation system for translating character strings written in a first language into character strings written in a second language and, more particularly, to a machine translation system which can improve its translation ability by learning translation examples.
2. Description of the Related Art
With recent advances in computer technology, machine translation systems for performing automatic translation have been developed. Many machine translation systems on the market use a translation scheme called a sentence-structure conversion scheme. An outline of processing performed by a machine translation system using the sentence-structure conversion scheme will be described below with reference to FIG. 1.
Assume that an English sentence "Development of computer science and linguistics opened the way to machine translation." is input to the machine translation system. In this case, the input English sentence is decomposed into words, and the parts of speech (e.g., noun (n) and transitive verb (vt)) of the decomposed words are determined (morphological analysis). Thereafter, the structure of the input English sentence is analyzed on the basis of predetermined grammatical rules. As a result, the input English sentence is decomposed into a noun phrase (NP), a verb phrase (VP), and the like (sentence-structure analysis). The sentence structure obtained by this analysis is converted into another language (e.g., Japanese), and morphemes are generated, thereby generating a Japanese sentence "&lt;KEISANKIKAGAKU TO GENGOGAKU NO HATTEN WA KIKAIHONYAKU NI TAISHITE MICHIWO AKETA&gt;". Although the sentence in the quotation marks would written in Japanese, it is here written in Roman letters within the marks "&lt;&gt;" for the sake of understanding of the contents.
Since a machine translation system of the sentence-structure conversion scheme performs translation on the basis only of grammatical rules, the system can express nothing outside the grammatical rules. As a result, an unnatural Japanese translation is output. For example, in the Japanese sentence generated above, the abstract noun "development" becomes the subject, and the active voice is employed. However, such a Japanese sentence is unnatural.
In order to solve this problem and improve the translation quality, grammatical rules must be added. As a result, the number of rules increases, and grammatical rules may interfere with each other, causing a deterioration in translation quality.
Under these circumstances, a machine translation system with an EBMT (Example-Based Machine Translation) scheme of performing translation on the basis of actual translations (translation examples) has recently been proposed (Nagao, M., "A Framework of a Mechanical Translation between Japanese and English by Analogy Principle", in ARTIFICIAL AND HUMAN INTELLIGENCE [Elitithorn 6 Baneriji, Eds.], Elsevier Science Publications, pp. 173-180, 1984). This machine translation system with an EBMT scheme retrieves the translation example which is most similar to the original sentence as a translation target, and performs translation on the basis of the translation example. Although practical means for realizing the machine translation system of the EBMT scheme have not been proposed yet, it is expected that the machine will perform processing like the one shown in FIG. 2.
According to the EBMT scheme machine translation system, in sentence-structure analysis processing, sentence-structure analysis (NP, VP, and the like) of an original sentence is performed on the basis of a past translation example, and sentence-structure conversion is performed on the basis of this sentence-structure analysis result, thereby generating a translation of the original sentence. A method of generating a Japanese sentence on the basis of sentence-structure analysis in this manner is disclosed in, for example, chapter 4 of "Example-Based Machine Translation," published as a doctorial thesis by Satoshi Sato (Kyoto University) in September 1991.
Various problems, however, are posed by the above conventional machine translation system, as follows.
First, it is difficult for the user to improve the translation ability of the conventional machine translation system. In a machine translation system using the sentence-structure conversion scheme, grammatical rules and sentence-structure rules must be revised in order to improve the translation ability. Since grammatical rules and sentence-structure rules are incorporated, as programs, in the machine translation system, only the system developer can revise the rules. Therefore, the user cannot improve the translation quality, and hence cannot make the machine translation system perform the desired translation. Assume that an undesired translation result is output. In this case, even if the user corrects the translation result, the undesired translation result is repeatedly output with respect to the same original sentence. For this reason, an excessive load of correction work is imposed on the user.
As described above, although a machine translation system with an EBMT scheme has not been put into practice yet, the translation ability can be theoretically improved by adding/recording translation examples. In adding/recording translation examples, however, the operator must perform grammatical analysis (NP, VP, and the like) of translation examples to be added/recorded. For this reason, the work load on the operator increases.
Second, in conventional machine translation systems, improvement in translation quality is limited. Assume that in the machine translation system of the sentence-structure conversion scheme, grammatical rules and sentence-structure rules are added to improve the translation quality. In this case, since the number of rules increases, rules tend to interfere with each other. For this reason, the improvement in translation quality is limited. On the contrary, if rules are added, the translation quality may deteriorate.
In the machine translation system of the EBMT scheme, although the translation quality can be theoretically improved by adding translation examples, practical means for realizing this system have not been proposed yet.
Third, it is difficult to make natural translations by using conventional machine translation systems. Actual sentences are not necessarily generated on the basis of only grammatical rules. However, in the machine translation system of the sentence-structure conversion Scheme, translation is performed on the basis only of grammatical rules and sentence-structure rules. Therefore, translation results tend to be unnatural. For example, the above English sentence "Development of computer science and linguistics opened the way to machine translation." should be translated into "&lt;KEISANKIKAGAKU TO GENGOGAKU NO HATTEN NIYORI KIKAIHONYAKU HENO MICHIGA HIRAKETA&gt;". However, the translation result is the unnatural translation "&lt;KEISANKIKAGAKU TO GENGOGAKU NO HATTEN WA KIKAIHONYAKU NI TAISHITE MICHIWO AKETA&gt;".
Since a machine translation system with an EBMT scheme performs translation on the basis of a past translation examples, a relatively natural translation can be output. However, since grammatical analysis is performed on the basis of past translation examples, a natural translation may not be output with respect to an idiomatic expression which greatly deviates from grammar.
Fourth, learning results obtained by other machine translation systems cannot be effectively used. In a machine translation system using a sentence-structure conversion scheme, translation examples cannot be learned. For this reason, as is apparent, a database on which translation examples are learned/recorded cannot be used in other machine translation systems. In a machine translation system using the EBMT scheme, translation examples can be recorded in a database. However, translation examples are independently learned/recorded in the respective machine translation systems. For this reason, when a plurality of machine translation systems are used, learning operations may be redundantly performed, resulting in an increase in the work load on the user who performs learning/recording processing.