Machine translation is a process by which a textual input in a first language is automatically translated, using a computerized machine translation system, into a textual output in a second language. Some such systems operate using word based translation. In those systems, each word in the input text, in the first language, is translated into some number of corresponding words in the output text, in the second language. Better performing systems, however, are referred to as phrase-based translation systems. One example of those systems is set out in Koehn et al., Statistical Phrase-Based Translation, Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL) 127-133, Edmonton, Alberta, Canada (2003).
In order to train either of these two types of systems (and many other machine translation systems), current training systems often access a parallel bilingual corpus; that is, a text in one language and its translation into another language. The training systems first align text fragments in the bilingual corpus such that a text fragment (e.g., a sentence) in the first language is aligned with a text fragment (e.g., a sentence) in the second language that is the translation of the text fragment in the first language. When the text fragments are aligned sentences, this is referred to as a bilingual sentence-aligned data corpus.
In order to train the machine translation system, the training system must also know the individual word alignments within the aligned sentences. In other words, even though sentences have been identified as translations of one another in the bilingual, sentence-aligned corpus, the machine translation training system must also know which words in each sentence of the first language translate to which words in the aligned sentence in the second language.
One current approach to word alignment makes use of five translation models and is discussed in Brown et al., The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 19(2): 263-311 (1993). This approach to word alignment is sometimes augmented by a Hidden Markov Model (HMM) based model, or a combination of an HMM based model and Brown et al.'s fourth model, which has been called “Model 6”. These latter models are discussed in F. Och and H. Ney, A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics 29(1):19-51 (2003).
These word alignment models are less than ideal, however, in a number of different ways. For instance, although the standard models can theoretically be trained without supervision, in practice various parameters are introduced that should be optimized using annotated data. In the models discussed by Och and Ney, supervised optimization of a number of parameters is suggested, including the probability of jumping to the empty word in the Hidden Markov Model (HMM), as well as smoothing parameters for the distortion probabilities and fertility probabilities of the more complex models. Since the values of these parameters affect the values of the translation, alignment, and fertility probabilities trained by estimation maximization (EM) algorithm, there is no effective way to optimize them other than to run the training procedure with a particular combination of values and to evaluate the accuracy of the resulting alignments. Since evaluating each combination of parameter values in this way can take hours to days on a large training corpus, it is likely that these parameters are rarely, if ever, truly jointly optimized for a particular alignment task.
Another problem associated with these models is the difficulty of adding features to them, because they are standard generative models. Generative models require a generative “story” as to how the observed data is generated by an inter-related set of stochastic processes. For example, the generative story for models 1 and 2 mentioned above and the HMM alignment model is that a target language translation of a given source language sentence is generated by first choosing a length for the target language sentence, then for each target sentence position, choosing a source sentence word, and then choosing the corresponding target language word.
One prior system attempted to add a fertility component to create models 3, 4 and 5 mentioned above. However, this generative story did not fit any longer, because it did not include the number of target language words needed to align to each source language word as a separate decision. Therefore, to model this explicitly, a different generative “story” was required. Thus, a relatively large amount of additional work is required in order to add features.
In addition, the higher accuracy models are mathematically complex, and also difficult to train, because they do not permit a dynamic programming solution. It can thus take many hours of processing time on current standard computers to train the models and produce an alignment of a large parallel corpus.
The present invention addresses one, some, or all of these problems. However, these problems are not to be used to limit the scope of the invention in any way, and the invention can be used to address different problems, other than those mentioned, in machine translation.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.