This specification relates to machine translation.
Manual translation of text by a human operator can be time consuming and costly. One goal of machine translation is to automatically translate text in a source language to corresponding text in a target language. A machine translation system can use a decoder to apply a language model (e.g., a lexical or syntactic language model) and a translation model (e.g., word alignment or phrase-based translation) to a sentence in the source language in order to determine a candidate translation in the target language.
There are several different approaches to machine translation including example-based machine translation and statistical machine translation. Statistical machine translation attempts to identify a most probable translation in a target language given a particular input in a source language. For example, when translating a sentence from French to English, statistical machine translation identifies the most probable English sentence given the French sentence.
System combination in machine translation aims to build a composite or consensus translation from system outputs of multiple machine translation engines. Computing consensus translations is one way to improve translation quality in many machine translation tasks. A consensus translation can be computed by voting on the translation outputs of multiple machine translation systems. Depending on how the translation outputs are combined and how the voting scheme is implemented, the consensus translation may differ from one or more of the original translation outputs.
Some combination systems use candidate selection, which selects for each input sentence one of the translation outputs generated by the multiple machine translation systems. Typically, this selection is made based on translation scores, confidence estimations, language models, or other models. For many machine translation systems, however, the scores are not normalized or may not be available, making it difficult to apply candidate selection. Other combination systems combine translation outputs on a word level or a phrase level.
Although machine translation system combination can lead to substantial improvements in translation quality, not every possible ensemble of machine translation systems has the potential to outperform the primary machine translation system (i.e., the machine translation system in the ensemble with the best individual performance). Some combinations of machine translation systems can produce combined outputs that may even deteriorate translation quality.