So-called “transliteration” is to translate a word in one language into a word with a similar pronunciation in another language. For instance, a transliteration method is often used in translating a proper name. Previously, people usually use a bilingual lexicon to translate proper name. Such a bilingual lexicon (e.g., a bilingual proper name lexicon) is compiled by linguists or specialists in related fields, which has very high accuracy.
However, even a very large bilingual lexicon cannot cover the whole vocabulary, very often people would encounter a case in which a wanted word cannot be found in a lexicon. Furthermore, with the development of time and society, new words are emerging continuously, making the situation even worse. Therefore, for a long time, people need a method and apparatus for automatic transliteration to realize automatic transliteration between two languages. Such an automatic transliteration technology is also important to machine translation, cross language information retrieval and information extraction.
The existing automatic transliteration technology is described, for example, in the article entitled “Transliteration of Proper Names in Cross-Lingual Information Retrieval”, Paola Virga and Sanjeev Khudanpur, Proceedings of 41st ACL Workshop on Multilingual and Mixed-language Named Entity Recognition, pp. 57-64, 2003. The article describes a statistic machine translation technology based English to Chinese transliteration method, the specific steps of which are shown in the following Table 1, comprising:
(1) transforming English words into a phone sequence that represents pronunciation by using the Festival voice synthesis system developed by CMU;
(2) transforming the English phone sequence into an initials and finals sequence that represents the pronunciation of Chinese characters by using the IBM translation model;
(3) combining the initials and finals sequence into Chinese Pinyin syllables;
(4) transforming the Chinese Pinyin into Chinese characters by using the IBM translation model again;
(5) combining the Chinese characters into Chinese transliterated words by using a language model developed by CMU.
TABLE 1Chinese trans-      literated word
There are two problems in above-mentioned automatic transliteration method:
(1) a voice synthesis system is needed to help to transform English words into a pronunciation sequence, which would introduce additional errors during transliteration due to the fact that the existing voice synthesis technology is immature; and since the size of a lexicon is limited, the method of marking English word pronunciation with a pronunciation lexicon is unable to solve the problem of marking a word that is out of the lexicon, especially, this problem becomes prominent for proper names and newly emerged words that need to be transliterated.
(2) English is a multi-syllable language (that is, one English word usually contains multiple syllables), while Chinese is a single-syllable language (that is, one Chinese character is one syllable), neither English letter, phone, syllable nor word could correspond to the natural unit of Chinese—Chinese character. Therefore, the method in the above article is only suitable for English to Chinese transliteration, but not suitable for Chinese to English transliteration.