The present exemplary embodiment is directed to the field of machine translation. It finds particular application in connection with the use of overlapping biphrases in alignments for phrase-based statistical machine translation systems.
Phrase-based statistical machine translation (SMT) systems employ a biphrase table or “dictionary” as a central resource. This is a probabilistic dictionary associating short sequences of words in two languages that can be considered to be translation pairs. The biphrase table is automatically extracted, at training time, from a large bilingual corpus of aligned source and target sentences. When translating from a source to a target language (decoding), the biphrase table is accessed to retrieve a set of biphrases, each of which includes a target phrase which matches part of a source sentence or other text string to be decoded. Traditional approaches to phrase-based machine translation use dynamic programming to search for a derivation (or phrase alignment) that achieves a maximum probability (or score), given the source sentence, using a subset of the retrieved biphrases. Typically, the scoring model attempts to maximize a log-linear combination of the features associated with the biphrases used. Biphrases are not allowed to overlap each other, i.e., no word in the source and target sentences of an alignment can be covered by more than one biphrase.
Typically, the source sentence is partitioned into spans of words, each span covered by a biphrase, which are then reordered. While this method performs fairly well in practice, there are several disadvantages to the conventional approach. First, finding the optimal partitioning of the source sentence and selection of biphrases can be done efficiently only if the global score is composed of several local scores. However, corresponding short-range Markov assumptions may be too limited to capture all dependencies of interest. Second, unrestricted reordering makes decoding NP-complete (i.e., lacking a rapid solution), and the search for a solution has to be approximated. This is conventionally achieved by beam-search techniques, which work by generating the target sentence from left to right. Such techniques have difficulty recovering from incorrect decisions taken early on in the process. Third, maximizing the joint probability of translation and an auxiliary (hidden) phrase alignment (“Viterbi decoding”) is not necessarily an ideal objective when only the translation itself is of interest.
There remains a need for training and decoding methods that allow a richer representation of translation pairs and a more flexible search.