Traditionally, Chinese character data entry into a computer has been accomplished by: (1) using a phonetic encoding method, and (2) using a structure-based encoding method. Under phonetic encoding, such as PinYin, a person types the Roman characters that represent the sound of a Chinese character, and a list of Chinese characters that have that sound is presented. The person then selects the appropriate Chinese character from the list of Chinese characters that are presented.
Under structure-based encoding, such as Wubi, Cangjie, and Four-Corner, each Chinese character is encoded into a string of Roman characters based on the Chinese characters' structure. The encoded string of characters contains the structural information of the Chinese character, and can be used to determine the structural similarity of the Chinese character to other Chinese characters. This is because if two Chinese characters look similar, then their encoded strings are similar, and if two Chinese characters' encoded strings are similar, then the corresponding Chinese characters look similar.
However, because many Chinese characters are either similarly pronounced and/or similar in appearance, mistakes can be made during data entry. For example, mistakes can be made due to the structural similarity of the Chinese characters, where a Chinese character is read/interpreted incorrectly by the data entry clerk because the data entry clerk wrongly identified the Chinese character for a different (but similar looking) Chinese character. Mistakes can also be made when using a phonetic input method, where a different character that has the same (or similar) pronunciation is selected during data entry. Mistakes can further be made due to using an inappropriate structure-based input method, where a different character that has a similar encoding is entered.
These are the areas that embodiments of the invention are intended to address.