1. Field of the Invention
The present invention relates generally to language data storage and text input on electronic devices, and in particular to creation and usage of Chinese language data for the purposes of text processing and text input.
2. Description of the State of the Art
In Mandarin Chinese, every character is spoken in a mono syllabic manner. Mandarin Chinese contains over 10,000 characters that are comprised of variations of 405 base “Pinyin” characters and 5 tones. The syllabary principle is the basis for Chinese writing systems such as Hung and Tzeng, where many syllables can represent the same sound, and the same sounds are often represented by many different symbols. In addition, words are not separated by spaces in Chinese language, requiring Chinese linguistic data for proper segmentation of words during Chinese text input on electronic devices. Research reveals that static linguistic data is not sufficient to provide proper word segmentation in most cases. Only systems with learning of user input patterns provide a level of segmentation accuracy for efficient Chinese text input. Thus, extensive linguistic data is required, accounting for the very high memory usage by most Chinese text input systems.
Existing solutions for storage and use of linguistic data for purposes of text input employ such data structures as hash tables, trees, databases or word lists. These solutions are not feasible in many modern systems, in that they require significant memory and code space to store and support the complex data structures they rely upon, and consume a large amount of processing resources. Electronic devices such as portable electronic devices, including mobile communication devices, for example, have limited processing and memory resources which preclude the use of these existing solutions.