1. Field of the Invention
The present invention relates to a machine translation apparatus and, more specifically, to a statistical machine translation apparatus capable of performing highly accurate translation taking advantage of example-based translation.
2. Description of the Background Art
The framework of statistical machine translation formulates the problem of translating a sentence in a language (J) into another language (E) as the maximization problem of the following conditional probability.
      E    ^    =                              arg                ⁢                                  ⁢                  max                    E        ⁢                  ⁢          P      ⁡              (                  E          ❘          J                )            According to the Bayes' Rule, Ê may be written as:
      E    ^    =                              arg                ⁢                                  ⁢                  max                    E        ⁢                  ⁢          P      ⁡              (        E        )              ⁢                  P        ⁡                  (                      J            ❘            E                    )                    /              P        ⁡                  (          J          )                    In this equation, Ê may be computed independent of the term P(J). Therefore,
      E    ^    =                              arg                ⁢                                  ⁢                  max                    E        ⁢                  ⁢          P      ⁡              (        E        )              ⁢          P      ⁡              (                  J          ❘          E                )            The first term P(E) on the right side is called a language model, representing the likelihood of sentence E. The second term P(J|E) is called a translation model, representing the probability of generating sentence J from sentence E.
Under this concept, a translation model has been proposed where a sentence of a first language (referred to as a channel target sentence) is mapped to a sentence of a second language (referred to as a channel source sentence) with the notion of word alignment (finding correspondence between words). This translation model has been successfully applied to similar language pairs, such as French-English and German-English.
The translation model, however, achieved little success when applied to drastically different language pairs, such as Japanese-English. The problem lies in the huge search space caused by the frequent insertions/deletions of words, the larger numbers of fertility for each word and the complicated word alignment, experienced in mapping between languages of different structures. Due to search complexity, a beam search decoding algorithm would result in mere sub-optimal (limited/local) solutions.
Word alignment based statistical translation expresses bilingual correspondence by the notion of word alignment A, allowing one-to-many correspondence of words. Word alignment A is an array describing which word of a channel target sentence corresponds to which word of a channel source sentence, using indexes to the words of the channel source sentence. In this array, correspondence to the words of the channel source sentence is denoted by the indexes added to the words of channel source sentence, and the indexes are arranged in accordance with the order of words of the channel target sentence.
FIG. 7 shows Example A of word alignment of English (E) and Japanese (J). Referring to FIG. 7, words 1 to 7 of a sentence 110 of the second language (in this example, English, E) are aligned with words 1 to 6 of a sentence 114 of the first language (in this example, Japanese, J). The alignment is represented by lines 112 connecting the words of channel source sentence 110 to words of channel target sentence 114. By way of example, the word “show1” of channel source sentence 110 generates two words “mise5” and “tekudasai6” of channel target sentence 114. There are no corresponding words in channel source sentence 110 for two words “no2” and “o4” of channel target sentence 114, and therefore, “NULL0” is placed at the head of channel source sentence 110, and the two words are assumed to be aligned therewith. In this case, alignment A would be “7, 0, 4, 0, 1, 1.”
Under this word alignment assumption of such mapping, the translation model P(J|E) can be further decomposed as:
      P    ⁡          (              J        ❘        E            )        =            ∑      A                            ⁢                  ⁢          P      ⁡              (                  J          ,                      A            ❘            E                          )            
The term P(J,A|E) on the right side is further decomposed into four components. These four components constitute the prior art process of transferring a channel source sentence E into channel target sentence J having alignment A. The four components are as follows.
(1) Choose the number of words to generate for each word of the channel source sentence according to the Fertility Model. Two translation words may be generated from one word, or a translation word may not be generated at all.
(2) Insert NULLs at appropriate positions of the channel source sentence by the NULL Generation Model.
(3) Translate word-by-word for each generated word by looking up the Lexicon Model.
(4) Reorder the translated words by referring to the Distortion Model. Positioning is determined by the previous word's alignment to capture phrasal constraints.
In this manner, a translation model based on the idea of word alignment is obtained.
A method has been proposed, in which each word of a channel target sentence is translated to a channel source language, the resulting translated words are positioned in the order of the channel target sentence, and various operators are applied to the resulting sentence to generate a number of sentences. (Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada, “Fast decoding and optimal decoding for machine translation,” (2001) in Proc. of ACL2001, Toulouse, France.) In this proposed method, the sentence having the highest likelihood among the thus generated sentences is selected as the translation.
The word alignment based statistical translation model was originally intended for similar language pairs, such as French and English. When applied to Japanese and English, for instance, which have drastically different structures, this model results in very complicated word alignments, as seen in FIG. 7. The complexity is directly reflected by the structural differences. By way of example, English takes an SVO structure while Japanese usually takes the form of SOV. In addition, as is apparent from the example shown in FIG. 7, insertion and deletion occur very frequently. For instance, there exist no corresponding Japanese morphemes for “the3” and “the6” of FIG. 7. Therefore, they should be inserted when the Japanese sentence is translated into English. Similarly, Japanese morphemes “no2” and “o4” should be deleted.
Both the intricate alignments and the insertion/deletion of words lead to a computationally expensive process when a word-by-word beam search is applied. Some pruning strategies have to be introduced, so that the search system can output results in a reasonable time. However, search errors become inevitable under the restricted search space. Though there exist some correlations between translation quality and the probabilities assigned by the translation model, the beam search was often unable to find good translations.
The method proposed by Germann et al. is problematic as the search often reaches a local optimal solution, and it is not the case that highly accurate solution is stably obtained.