The availability of some bilingual, machine-readable texts has stimulated interest in processes for extracting linguistically valuable information from such parallel translated texts. In recent years there has been interest in training statistical machine learning systems to translate a source text in a different language. For example, a statistical translation device may obtain pairs of aligned sentences from parallel corpora. Sometimes obtaining aligned pairs of sentences is even possible without even inspecting the words that the sentences contain by inspecting the number of words that the sentences contain or the number of characters the sentences contain. Accordingly, statistical methods can be successful in achieving useful translation goals.
The translation of text from one human language to another by a computer can be performed using a statistical machine translation tool that learns how to translate languages from statistically analyzing vast amounts of source documents along with the source documents' human created translations in a different language. Statistical machine translation systems typically use a large body or corpus of parallel documents that have been translated by a human for training the statistical models being used. However, a limited number of preexisting bodies of translated documents are available. Even these existing bodies of translated documents are expensive to acquire and are often bound by restrictive licenses.