The invention relates to character code conversion methods, and in particular to conversion of character code between two character sets utilizing a lookup algorithm.
Multilingual applications are commonly utilized in many different fields, including electronic communication, making character code conversion important. Conventionally, typical conversion of character code between two different character code sets employs to establishment of a one-to-one mapping table.
For example, if the original character set is Big-5/GB2312 and a destination character set is Unicode, a one-to-one mapping table is established prior to character code conversion. In this example, because there are 13504 total characters in the Big-5 character set and 7446 in the GB2312 character set, the one-way mapping table requires (13504+7446)×(2+2)=83800 bytes. A one-way mapping table can only accomplish one-way conversion, that is, from Big-5/GB2312 to Unicode or from Unicode to Big-5/GB2312. If two-way conversion is required, such as from Big-5/GB2312 to Unicode and Unicode to Big-5/GB2312, mapping table size must be doubled, 83800×2=167600 bytes.
Furthermore, efficiency of conversion can be seen to correspond with as the efficiency of binary searching in the mapping table, with complexity of N×logN, N=13504 or 7446. Obviously, the large mapping table reduces the calculation speed of the central processing unit (CPU), affecting performance. In some systems, such as electronic communication systems, memory size and CPU calculation ability are limited, such that conventional methods are not suitable.
For example, a Chinese character  is encoded as “0xA7DA” in the Big-5 character set, and “0x6211” in a Unicode character set. If character  is converted from the Big-5 to the Unicode character set, a mapping table is first established, as shown in Table I.
TABLE IBig-5Unicode. . .. . .0xA7DA0x6211. . .. . .
Table I is a mapping table for character conversion from Big-5 to Unicode. Because the Big-5 character set has total 13504 characters, Table I has 13504 entries. If the character  is converted from Unicode to Big-5, another mapping table is required. The number of the entries of the mapping table equals the number of characters included in the Unicode character set.