The present invention relates to transliteration of characters, and more specifically, to transliteration of Chinese character names into Romanized names. As global travel becomes increasingly frequent, the need for name translation or transliteration from one language to another becomes more common, and standardizing the name transliteration process becomes increasingly important. Both the Chinese and Taiwanese governments, for example, have recently published official guidelines for Romanizing Chinese personal names. The two sets of guidelines are nearly identical and can be summarized as follows:                1. Use Mandarin Pinyin.        2. Observe the original surname (SN) given name (GN) order, with a space added between SN and GN.        3. Do not add a space between a two-character given name or a two-character surname, but insert an apostrophe to avoid ambiguity when the pronunciation of the second character begins with a vowel.        4. In the rare case where the surname field has two surnames (e.g., as seen in the names of some married women in Taiwan and Hong Kong), insert a hyphen between the two surnames.        
In theory, these guidelines should be adhered to anywhere Romanized Chinese names are used, e.g. in international publications, information processing, international travel documents, etc. However, automatic systems that transliterate personal names based on the standard conventions do not exist. Most translation systems, such as Google Translate (available from Google Inc. of Mountain View, Calif.) and Systran (available from Systran S.A. of Paris, France), occasionally resort to Chinese character translation rather than transliteration of names. Such systems typically contain hundreds of millions of text documents or databases storing patterns of text that have already been translated by human translators and looks for patterns to help decide on the best translation. By detecting patterns in documents that have already been translated by human translators, these systems try to provide a statistical machine translation, rather than transliteration. Two serious issues are associated with these kinds of systems.
The first problem is that these systems do not always recognize names correctly. For example, in the Chinese name  is a two-character surname and  is a two-character given name. However,  is also a meaningful phrase in Chinese, which means “to further one's education”. Google Translate correctly transliterates the name  to Ouyang, but translates  to “education”, instead of Jìnxiu, which would be the correct transliterated form. Systran, on the other hand, recognizes  as a name and transliterates it correctly. However, when replacing the two-character surname  with the single character surname , Systran translates the name  to “European further education” because  also means “Europe” in Chinese.
The second problem is that using databases to store known names and phrases may sometimes fail to distinguish the individual to whom the original name refers. For example,  a famous Chinese Kung Fu star, is known as Jet Li by the western world. Google Translate always renders  as Jet Li, regardless of whether the name refers to the Kung Fu star or not. Transliterating the name would yield “Li Lianjie,” which could indeed belong to quite a few people not as famous as Jet Li. Another interesting example, , is often used to refer to the ‘average Joe’ in Chinese but can also be a real name, “Zhang San.” Google Translate never provides a transliteration, but rather always translates it to Joe Smith.
While the International Components for Unicode (ICU) has developed a Han-Latin module, which can be plugged in for Chinese transliteration, it is not geared specifically for personal name transliteration. Given a string of Chinese characters, the ICU's Han-Latin module simply inserts a space between two adjacent characters and transliterates each character into a Pinyin representation. For example,  is turned into “Jiang Ze Min” instead of “Jiang Zemin.” This can lead to problems in situations where names are required to be parsed into a surname (SN) field and a given name (GN) field. The name “Jiang Ze Min” can mistakenly be parsed into “GN=Jiang Ze” and “SN=Min” in the Romanized form.
Parsing Romanized Chinese names may be error-prone even if the SN GN order is not a problem. Most common Chinese surnames are single characters. However, there exist quite a few dual character surnames. Some people also have two surnames (two single or even two double character surnames). For example, the single character  is a surname but it is also the first character of the dual character surname . ICU transliterates the name  as “Ou Yang Tian,” but it is not clear whether the SN is “OU” or “OU YANG,” based on the transliterated form. Thus, there is a need for improved automated techniques for transliterating Chinese names into a Romanized form.