The following relates to the machine translation arts, the statistical machine translation arts, and so forth.
Machine (or automated) translation from a source language to a target language is known. For example, such machine translation may automatically translate a source-language sentence in English, French, Chinese, or another natural language, to a corresponding target-language sentence in another natural language. Some machine translation systems further include a user interface via which the machine translation is presented to a user as a proposed translation, which may be accepted, rejected, or modified by the user via the user interface.
In translation memory systems, a translation memory stores previously translated text as source language content and corresponding translated target language content, with corresponding textual units (e.g., words, phrases, sentences, or so forth) in the source and target languages associated together. When source-language content is received for translation, it is compared with the source-language contents of the translation memory. If a match, or approximate match, is found, the corresponding aligned target language content is presented to the user. If the match is approximate, the user may also be informed of the differences. The translation memory approach depends upon the memory contents being accurate and sufficiently comprehensive to encompass a usefully large portion of the source-language content received for translation.
Another known technique for machine translation is statistical machine translation (SMT). In this approach, a database of source-language/target language phrases are stored as a phrase table. (The term “phrase” as used herein and in the SMT literature generally is to be understood as a unit of text, e.g. a word or sequence of words, in some instances possibly including punctuation—the term “phrase” is not limited herein or in the SMT literature generally to grammatical phrases.) A translation model is provided or developed. This model comprises an aligned translation conditional probability. The “aligned translation” comprises one or more target language phrases in a particular sequence (i.e., alignment), with each target language phrase corresponding to a phrase of the source language content. In operation, the SMT generates candidate translations for received source language content to be translated by selecting target language phrases from the phrase table that match source language phrases of the source language content. The translation model is used to assess the candidate translations so as to select a translation having a high probability as assessed by the model. Since the number of candidate translations can be too large to exhaustively search, in some SMT configurations the translation model is used to guide the generation of candidate translations, for example by modifying a previously generated candidate translations to generate new candidate translations having high probabilities as assessed by the model.
Similarly to the translation memory approach, SMT depends on the comprehensiveness and accuracy of the phrase table. However, since the phrases are generally substantially shorter that textual units of a translation memory, it is generally easier to generate an accurate and reasonably comprehensive phrase table. SMT also depends on the accuracy of the translation model. Toward this end, the translation model is generally constructed to be “tunable”, that is, the translation model includes model parameters that can be optimized based on a development dataset comprising source language sentences and corresponding aligned target language translations.
The following discloses various improvements in machine translation apparatuses and methods.