The invention relates to computerized language translation, such as computerized translation of a French sentence into an English sentence.
In U.S. patent application Ser. No. 07/736,278, filed Jul. 25, 1991, now pending, entitled "Method and System for Natural Language Translation" by Peter F. Brown et al (the entire content of which is incorporate herein by reference), there is described a computerized language translation system for translating a text F in a source language to a text E in a target language. The system described therein evaluates, for each of a number of hypothesized target language texts E, the conditional probability P(E.vertline.F) of the target language test E given the source language text F. The hypothesized target language text E having the highest conditional probability P(E.vertline.F) is selected as the translation of the source language text F.
Using Bayes' theorem, the conditional probability P(E.vertline.F) of the target language text E given the source language text F can be written as ##EQU1##
Since the probability P(F) of the source language text F in the denominator of Equation 1 is independent of the target language text E, the target language text E having the highest conditional probability P(E.vertline.F) will also have the highest product P(F.vertline.E) P(E). We therefore arrive at ##EQU2## In Equation 2, the probability P(E) of the target language text E is a language model match score and may be estimated from a target language model. While any known language model may be used to estimate the probability P(E) of the target language text E, Brown et al describe an n-gram language model comprising a 1-gram model, a 2-gram model, and a 3-gram model combined by parameters whose values are obtained by interpolated estimation.
The conditional probability P(F.vertline.E) in Equation 2 is a translation match score. As described by Brown et al, the translation match score P(F.vertline.E) for a source text F comprising a series of source words, given a target hypothesis E comprising a series of target words, may be estimated by finding all possible alignments connecting the source words in the source text F with the target words in the target text E, including alignments in which one or more source words are not connected to any target words, but not including alignments where a source word is connected to more than one target word. For each alignment and each target word e in the target text E connected to .phi. source words in the source text F, there is estimated the fertility probability n(.phi..vertline.e) that the target word e is connected to the .phi. source words in the alignment. There is also estimated for each source word f in the source text F and each target word e in the target text E connected to the source word f by the alignment, the lexical probability t(f.vertline.e) that the source word f would occur given the occurrence of the connected target word e.
For each alignment and each source word f, Brown et al further estimate the distortion probability a(j.vertline.a.sub.j,m) that the source word f is located in position j of the source text F, given that the target word e connected to the source word f is located in position a.sub.j in the target text E, and given that there are m words in the source text F.
By combining the fertility probabilities for an alignment and for all target words e in the target text E, and multiplying the result by the probability ##EQU3## of the number .phi..sub.0 of target words not connected with any source words in the alignment, given the sum of the fertilities .phi. of all of the target words in the target text E in the alignment, a fertility score for the target text E and the alignment is obtained.
By combining the lexical probabilities for an alignment and for all source words in the source text F, a lexical score for the alignment is obtained.
By combining the distortion probabilities for an alignment and for all source words in the source text F which are connected to a target word in the alignment, and by multiplying the result by ##EQU4## (where .phi..sub.0 is the number of target words in the target text E that are not connected with any source words), a distortion score for the alignment is obtained.
Finally, by combining the fertility, lexical, and distortion scores for the alignment, and multiplying the result by the combinatorial factor ##EQU5## a translation match score for the alignment is obtained. (See, Brown et al, Section 8.2.)
The translation match score P(F.vertline.E) for the source text F and the target hypothesis E may be the sum of the translation match scores for all permitted alignments between the source text F and the target hypothesis E. Preferably, the translation match score P(F.vertline.E) for the source text F and the target hypothesis E is the translation match score for the alignment estimated to be most probable.
Equation 2 may be used to directly estimate the target hypothesis match score P(F.vertline.E)P(E) for a hypothesized target language text E and a source language text F. However, in order to simply the language model P(E) and the translation model P(F.vertline.E), and in order to estimate the parameters of these models from a manageable amount of training data, Brown et al estimate the target hypothesis match score P(F.vertline.E)P(E) for simplified intermediate forms E' and F' of the target language text E and the source language text F, respectively. Each intermediate target language word e' represents a class of related target language words. Each intermediate source language word f' represents a class of related source language words. A source language transducer converts the source language text F to the intermediate form F'. The hypothesized intermediate form target language text E' having the highest hypothesis match score P(F'.vertline.E')P(E') is estimated from Equation 2. A target language transducer converts the best matched intermediate target language text E' to the target language text E.
In their language translation system, Brown et al estimate the lexical probability of each source word f as the conditional probability t(f.vertline.e) of each source word f given solely the target word e connected to the source word in an alignment. Consequently, the lexical probability provides only a coarse estimate of the probability of the source word f.