1. Field of the Invention
The present invention relates to technology of information processing, more particularly to technology for improving word alignment quality in a multilingual corpus.
2. Description of the Related Art
In a process of aligning words in a multilingual corpus, current statistical methods can only align bilingual words. Detail description of the current statistical methods can be seen in an article “The Mathematics of Statistical Machine Translation: Parameter Estimation” written by Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra and Robert L. Mercer, Computational Linguistics, 1993, Vol. 19, Number 2, pages: 263-311, which is incorporated herein by reference (referred to reference 1 below).
Thus, for a multilingual corpus including M languages (M>2), the current methods can only align words in two languages each time. Therefore, word alignment results may conflict with each other since there may be word alignment errors in the aligning process, causing a problem of word alignment inconsistency. Detail description will be given below with a multilingual corpus including English, Chinese and Japanese as an example.
For the multilingual corpus including English, Chinese and Japanese, words in Japanese and Chinese, words in English and Chinese and words in Japanese and English can be aligned respectively by using the above-mentioned current statistical methods. For example, for the following three sentences:
  (which is Japanese sentence that means “I would like to change my flight.” in English).
I would like to change my flight.
 (which is Chinese sentence that means “I would like to change my flight.” in English)
They can be aligned as:

In the above alignments, for Japanese and Chinese,  is aligned with , for Japanese and English,  is aligned with “like to”, as shown by the dashed lines. Therefore, for Chinese and English, it should be  is aligned with “like to”, but actually  is aligned with “would”.
Therefore, in the above alignments, since there are errors in the alignment  with “like to” and the alignment of  with “would”, word alignment results conflict with each other.
Therefore, there is a need to provide a method for improving word alignment quality and consistency in a multilingual corpus.