This invention relates to machine translation.
Machine translation, in general, makes use of computers to automate some or all of the process of translating text or speech from one natural language to another. One major field in machine translation, possibly the most widely studied to date, is statistical machine translation (SMT). Using statistical models developed from the analysis of bilingual text corpora (a process called training), SMT aims at generating an output in a target language (e.g., English) that maximizes some key value function (e.g., representing faithfulness or fluency) given an input in a source language (e.g., Chinese). SMT systems are generally not tailored to any specific pair of languages, and do not necessarily require manual development of extensive linguistic rules. Such manual development of rules can be both expensive and inefficient.
The field of SMT has evolved over the years. In its early stages, many translation systems were developed using a word-based approach. That is, by treating words as the basic translation elements, each source language word was substituted by a target language word to form a translated sentence. The probability of such sentence being a good translation is approximated using the product of the probabilities that each target language word is an appropriate translation of the corresponding source language word, and using a language model probability for the sentence in the target language. For example, a Markov Chain (“N-gram”) language model was used to determine the language model probability. One aspect of an N-gram language model is that it can be difficult to capture long-range dependencies in a word sequence. In the recent decade, significant advances were made with the introduction of an improved phrase-based approach. By expanding the basic unit of translation from words to phrases (i.e., substrings of a few consecutive words), a phrasal approach can effectively reduce search space for SMT.
Many existing phrase-based SMT systems still suffer from several disadvantages. For example, although they may robustly perform translations that are localized to a few consecutive words that have been recognized in training, most existing systems do not account for long-distance word dependency. For example, learning non-contiguous phrases, e.g., English-French pairs as simple as “not”→“ne . . . pas”, can still be difficult in current phrasal systems.
In some approaches to machine translation, the translation process makes use of tree structures, for example using content-free grammars, in both the source language and the target language. Such approaches may have poor accuracy in cases in which an accurate source language tree structure is not available, for example, due to deficiencies in the representation of the source language.