This specification relates to statistical machine translation.
Manual translation of text by a human operator can be time consuming and costly. One goal of machine translation is to automatically translate text in a source language to corresponding text in a target language. There are several different approaches to machine translation including example-based machine translation and statistical machine translation. Statistical machine translation attempts to identify a most probable translation in a target language given a particular input in a source language. For example, when translating a sentence from French to English, statistical machine translation identifies the most probable English sentence given the French sentence. This maximum likelihood translation can be expressed as:
                    arg        ⁢                                  ⁢        max            e        ⁢          P      ⁡              (                  e          |          f                )              ,which describes the English sentence, e, out of all possible sentences, that provides the highest value for P(e|f). Additionally, Bayes Rule provides that:
      P    ⁡          (              e        |        f            )        =                              P          ⁡                      (            e            )                          ⁢                  P          ⁡                      (                          f              |              e                        )                                      P        ⁡                  (          f          )                      .  
Using Bayes Rule, this most likely sentence can be re-written as:
                    arg        ⁢                                  ⁢        max            e        ⁢          P      ⁡              (                  e          |          f                )              =                    arg        ⁢                                  ⁢        max            e        ⁢          P      ⁡              (        e        )              ⁢                  P        ⁡                  (                      f            |            e                    )                    .      
Consequently, the most likely e (i.e., the most likely English translation) is one that maximizes the product of the probability that e occurs and the probability that e would be translated into f (i.e., the probability that a given English sentence would be translated into the French sentence).
Components that perform translation portions of a language translation task are frequently referred to as decoders. In certain instances, a first decoder (a first-pass decoder) can generate a list of possible translations, e.g., an N-best list. A second decoder (a second-pass decoder), e.g., a Minimum Bayes-Risk (MBR) decoder, can then be applied to the list to ideally identify which of the possible translations are the most accurate, as measured by minimizing a loss function that is part of the identification. Typically, an N-best list contains between 100 and 10,000 candidate translations (or hypotheses). Increasing the number of candidate translations and efficiency in which the candidate translations are encoded improves the translation performance of an MBR decoder.