Word alignment is widely used in natural language processing. Existing word alignment technique usually uses a statistical word alignment model to make correspondence between a pair of words, each of which is a translation of the other, in a bilingual sentence. The statistical word alignment model contains statistical information used for determining a pair of words, each of which is a translation of the other, in a bilingual sentence.
In the article by P. F. Brown, S. A. Della Pietra, V. J. Della Pietra and R. Mercer published in 1993,“The Mathematics of Statistical Machine Translation: Parameter Estimation” (Computational Linguistics, 19(2): 263-311), a statistical machine translation model and a statistical word alignment model as well as corresponding parameter estimation method are described.
However, since the current statistical word alignment model uses large-scale unlabeled bilingual corpus to train a statistical word alignment model without supervision, such a statistical word alignment model could lead to producing many erroneous word alignment results. If a bilingual corpus, of which the word alignment is manually labeled, is used to make training in a supervised way, an alignment model with higher accuracy can be obtained.
On the other hand, it is a laborious work to align the words in a large-scale bilingual corpus manually. If only a small-scale corpus needs to be manually labeled, it will not take too much labor and time.