Machine translation refers to a computer-implemented translation of text from one language referred to as the source language to another language referred to as the target language. For example, machine translation can be employed to translate English language text into Arabic. There are several approaches to implement this machine translation, the most popular of which is Statistical Machine Translation (SMT). Typical SMT systems are driven by several models, the most important of which are the phrase table and the language model. The phrase table is a huge collection of aligned phrase pairs. Each phrase pair consisting of a source language phrase and the corresponding target language phrase, where a phrase can be made up of one or more tokens. Associated with each phrase pair are a set of probabilities.
During actual translation, the source language is segmented, based on segments found in the phrase table. Usually, more than one possible segmentation is generated. For each of the possible segmentation, each segment is translated into one or more translations provided by the phrase table. A reordering model is used to generate possible reordering alternatives of the target translations. A lattice of possible sentence translations is generated based on the different segmentations, phrase translations and reordering hypotheses, and presented to a decoder which, driven by a target language model, will identify the most likely paths, and hence generate a ranked list of possible translations.
Typically, phrase pairs are learned through automatic alignment of manually translated source language sentences. The translation accuracy is highly dependent on the coverage of the phrase table which in turn is dependent on the size of the training data consisting of the parallel sentences.