Word alignment is widely used in natural language processing. Existing word alignment technology usually uses a statistical word alignment model to align the corresponding words in a bilingual sentence pair. The statistical word alignment model contains statistical information used for determining the corresponding words in a bilingual sentence pair.
In the article by P. F. Brown, S. A. Della Pietra, V. J. Della Pietra and R. Mercer published in 1993, “The Mathematics of Statistical Machine Translation: Parameter Estimation” (Computational Linguistics, 19(2): 263-311), a statistical machine translation model and a statistical word alignment model as well as corresponding parameter estimation method are described.
The statistical word alignment model needs a large enough bilingual corpus to train the parameters. If there is no large enough corpus for training, it is impossible to produce alignment result with high quality by using the obtained parameters. However, for some languages, available bilingual corpus is still less, so the amount of bilingual corpus limits the quality of the statistical word alignment model and becomes an obstacle to the further application of the statistical word alignment model.