This specification relates to statistical machine translation.
Manual translation of text by a human operator can be time consuming and costly. One goal of machine translation is to automatically translate text in a source language to corresponding text in a target language. There are several different approaches to machine translation including example-based machine translation and statistical machine translation. Statistical machine translation attempts to identify a most probable translation in a target language given a particular input in a source language. For example, when translating a sentence from French to English, statistical machine translation identifies the most probable English sentence given the French sentence. This maximum likelihood translation can be expressed as:
            argmax      e        ⁢          P      ⁡              (                  e          |          f                )              ,which describes the English sentence, e, out of all possible sentences, that provides the highest value for P(e|f) Additionally, Bayes Rule provides that:
      P    ⁡          (              e        |        f            )        =                              P          ⁡                      (            e            )                          ⁢                  P          ⁡                      (                          f              |              e                        )                                      P        ⁡                  (          f          )                      .  Using Bayes Rule, this most likely sentence can be re-written as:
            argmax      e        ⁢          P      ⁡              (                  e          |          f                )              =            argmax      e        ⁢          P      ⁡              (        e        )              ⁢                  P        ⁡                  (                      f            |            e                    )                    .      
Consequently, the most likely e (i.e., the most likely English translation) is one that maximizes the product of the probability that e occurs and the probability that e would be translated into f (i.e., the probability that a given English sentence would be translated into the French sentence).
Components that perform translation portions of a language translation task are frequently referred to as decoders. In certain instances, a first decoder (a first-pass decoder) can generate a list of possible translations, e.g., an N-best list. A second decoder (a second-pass decoder), e.g., a Minimum Bayes-Risk (MBR) decoder, can then be applied to the list to ideally identify which of the possible translations are the most accurate, as measured by minimizing a loss function that is part of the identification. Typically, an N-best list contains between 100 and 10,000 candidate translations, or hypotheses. Increasing the number of candidate translations improves the translation performance of an MBR decoder.