The rapid development of information and network technologies facilitates efficient information exchanges on an everyday basis especially via on-line communication. More specifically, in recent years, communication between Chinese communities in different parts of the world has been on a constant increase. However, such advantage may not have been fully enjoyed by Chinese users, as there are two script versions of written characters currently in use in different Chinese communities, i.e., the BIG5 code Traditional Version of Chinese characters (henceforth TC) prevalent in such regions as Hong Kong, Macao and Taiwan, and the GB code Simplified Version of Chinese characters (henceforth SC) in mainland China and Singapore, among others. Admittedly, the majority of the SC characters are either identical with their TC counterparts, e.g.  and  (ren “human”) or formally “simplified” from their TC counterparts without any change in meaning or usage, such as the SC  (fu) and the TC  (fu) in  (pi-fu “skin”) While such cases require no more than a straightforward one-to-one converting operation, it is of interest to note that, of the 41,321 SC characters we have surveyed 1,404, or 3.398%, are one-to-many cases in which an SC character has several TC counterparts different in semantic meaning. For example, the SC  (fa) should be  in  (fa-zhan “development”) but  in  (mao-fa “hair”) (see also [1], p. 150). On top of that, complexity may also arise on the word level, for instance, because of regional variations even though no SC-TC conversion is involved.
As a result, between the traditional version of Chinese characters and the simplified one, subtle yet extensive differences in both formation and usage may result in unexpected hindrances to verbal communication, largely because of the one-to-many cases where a simplified character has more than one equivalent in the traditional version with different semantic meanings.
A variety of automatic conversion tools have been developed and installed for general use in nearly every Office-kind tool in the market, such as Microsoft Office, Sun Open Office and KingSoft WPS. Free software systems and applications are also easily available online. Conversion results produced by these tools, however, often fall short of a professional standard, i.e. precision of the conversion results is not high enough for professional uses especially when one-to-many cases are involved.
In serious or high-end document processing such as diplomatic documentation, public discourse and TV subtitling, a flawed conversion can cause unexpected or even serious problems. For such document processing, errors in machine conversion have to be rectified manually, which is a costly operation. Yet, since the characters have been converted automatically and “quietly”, so to speak, without leaving behind any traceable marking, human editing can turn out to be a tedious and time consuming operation to check out all such changes for verification and rectification purposes.
One of the most recent developments is the conversion system proposed by Min-Hsiang Li, Shih-Hung Wu, “Chinese Characters Conversion System based on Lookup Table and Language Model”, issued in Proceedings of the Conference on Computational Linguistics and Speech Processing, pp. 113-127, 2010 (henceforth Li's method or Li's system), which uses a methodology termed by the authors as “lookup table” and “language model” to disambiguate one-to-many cases and to tackle regional variations of lexical terms, with all the data coming from Wikipedia. The experimental results show that the system outperforms other popular conversion systems significantly in terms of conversion precision. However, relying solely on one data resource without recourse to other more authoritative data resources cannot ensure conversion quality especially in view of the fact that some mappings may differ in different data resources. The performance is also slowed down by the large-scale N-Gram calculations the system has to go through in every conversion operation.