Some spoken languages can be written, for example, by using character sets that often number into a very large number of characters. In particular, the Chinese language comprises tens of thousands of characters. The Chinese character set includes characters, wherein each character is an ideograph that typically comprises many (as compared with Western character sets) strokes. The ideograph generally represents concepts (rather than sounds) although each ideograph is assigned a sound that corresponds to words in a Chinese language. Some of the Chinese characters may be associated with more than one sound or word in a Chinese language.
Because of the difficulties involved in entering such characters, several input methods for entering these characters have evolved. One input method editor (IME) uses the Pinyin system in which several characters are used to represent a sound that can represent a single word or character. Pinyin is a system of romanization of Chinese written characters such that Roman characters can be used to represent the phonetic sounds of the Chinese characters. Several Roman characters can be used to identify the Chinese character the user wishes to input. Other methods can include stroke-based entry for identifying characters, which can be used to select characters based on the number and types of strokes that are input.
Chinese can be entered into a computer system by users who select pinyin characters (or strokes) to select desired Chinese characters. Chinese characters are typically encoded (as two bytes) using a Unicode system. The encoded Chinese characters are typically used for storing information (such as words, names, places, and the like) because of the smaller size that is required when storing information using Chinese characters. (This background information is not intended to identify problems that must be addressed by the claimed subject matter.)