Translation models are used to translate a sentence in a source language into a sentence in a target language. For instance, translation models can be used to translate an English sentence into its French equivalent.
Translation models have been developed that rely on both one-to-many translations, known as word translations, and many-to-many translations, known as phrase translations. In one-to-many translations, one word in a source language is translated into one or more words in a target language. In many-to-many translations, multiple contiguous words in a source language are translated into multiple contiguous words in a target language.
In order to construct a translation model, a bilingual corpus, consisting of source sentences of a first language aligned with target sentences of a second language, is used to identify possible word translations and phrase translations. Word translations are typically identified using a statistical word aligner that identifies alignments between words in the source sentence and words in the target sentence based on a number of factors including the rate of co-occurrence of the source words and target words in aligned sentences of the bilingual corpus.
Phrase alignments have been extracted directly from sentence aligned bilingual corpora using similar statistical techniques. In other systems of the past, phrase alignments are extracted by first extracting word alignments and then using the word alignments to identify phrases. In such systems, a source phrase and a target phrase are said to be aligned when none of the words of the source phrase are aligned with a word in the target sentence that is outside of the target phrase and none of the words in the target phrase are aligned with words in the source sentence outside of the source phrase, and at least one word in the source phrase is aligned to a word in the target phrase.
A naive algorithm that independently compared each possible source phrase with each possible target phrase would have a complexity of at least O(l2m2), where l and m are the lengths of the source and target sentences, respectively.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.