1. Field of the Invention
This invention relates to the field of efficient character entry into electronic devices, and more specifically to an efficient keypad entry system and method for Asian languages.
2. Background Information
The use of reduced keypads, such as those found on mobile telephones, is manageable for entering text in the Roman alphabet because there are only 26 letters and various control characters to distribute over 8 keys, generally 3-4 letters per key. Korean, Chinese, Japanese, and Vietnamese languages, by contrast, contain many characters and it is thus difficult to present all or even a meaningful subset of the languages' elements on a reduced keypad.
The modern versions of these Asian languages are written using Jamo/Hangul (Korean), Hanzi (Chinese), Kanji, Hiragana, Hatakana (Japanese), and Latin based characters with additions and tone markings (Vietnamese); each language also uses Latin characters. Text entry of these languages generally involves (1) keypad entry that is interpreted by an input method and a dictionary, and (2) interactive display and selection of candidates from a list presented as a result of the keypad entry.
Hangul, meaning Korean script, refers to the characters used to express contemporary written Korean. Hangul also refers to the scientifically designed Korean writing system, the Korean alphabet. Korean words are written in Hangul symbol blocks, rather than by arranging letters left to right in a row as in the use of a Western alphabet, but Hangul characters can be easily decomposed into Hangul elements, unlike the syllabic writing systems of Japan and China. Hangul elements represent individual sounds, but do not commonly carry any meaning. FIGS. 1A and 1B list Korean Hangul elements, known as “jamo”, (meaning alphabet, sometimes referred to as “jaso”) which include ten monothong vowel signs 501, eleven diphthong vowel signs 502, and nineteen consonant signs—fourteen consonants 512 and five double consonants 514.
There are six ways to combine jamo to form Hangul characters, which are usually composed of two or three jamo (some jamo are considered compound). A complete Hangul character (a pre-combined Hangul) includes up to four jamo. Written jamo are combined into syllable blocks, each block being similar in appearance to a Chinese character. A written syllable is composed of three positions, i.e., initial, medial, and final, to be written in that order. The initial position, coseong, is usually a consonant, and includes 19 different possible jamo, including the Zero consonant. The medial position, jungseong, is usually a vowel or diphthong letter, and includes 21 different possible jamo. The final position, jongseong, including 28 (counting the placeholder) different possible jamo, is usually either one or two consonant letters, or left empty. Korean Hangul uses spaces to separate words, unlike Chinese and Japanese.
Hanzi, Chinese characters also known as ideographs, pictographs, or logographs, represent meanings. Hanzi appear in other Asian languages (called hanja in Korean, kanji in Japan, and chú Hán in Vietnamese) and often have the same meaning in all languages. Hanzi are composed of radicals, of which there are 214, and other non-radical elements, and radicals are composed of strokes. Hanzi are combined to form compounds.
The Japanese Hiragana and katakana, collectively known as kana, represent the same 108 sounds but are drawn differently from each other. Kana are used along with kanji and Latin characters in the same Japanese language sentence. They cannot be decomposed in an alphabetic way, i.e. into vowels and consonants. Hiragana characters are typically used for writing grammatical words. Katakana characters are commonly used for writing words borrowed from other languages.
Each of these languages is represented by at least one character set standard, usually formulated and proliferated by a governmental organization. For example, KS X 1001:1992, formulated in South Korea, is a basic Korean character set standard that enumerates 8,224 characters, 4,888 of which are hanja, 4620 unique hanja, and 2,350 pre-combined Hangul. The standard specifies 19 character classes, including jamo, Hangul, Roman, Greek, Latin, Cyrillic, and other symbols.
As another example, GB 13000.1, China's new national character standard, comprises 20,902 characters, and represents an effort to create a common writing method for information and communication products. GB 13000.1 code defines how the Chinese language is taught in schools and is commonly written. To date, the GB 13000.1 is China's largest effort to define a stroke writing order for the Chinese language. The standard builds upon the previous GB 2312-80 code of simplified Chinese characters by adding traditional characters as well as Chinese characters used in Korean and Japanese.
Within these standards are character-encoding standards that enable electronic processing of Asian characters. For each character set, there are possibly several encoding systems, each basically providing a mapping between each character in a particular character set, e.g. the set specified by KS X 1001:1992, and a numeric representation mapped to that character. Encoding systems arose in response to particular problems, and were optimized accordingly. ISO 2022, Extended Unix Code for Korea (EUC-KR), Johab (meaning “combining”), and Unified Hangul Code (UHC) encode the KS X 1001:1992 character set. UHC and Johab are both forward compatible with Unicode, the international 16-bit character set developed by the Unicode Consortium. Thus there are mappings between Johab, for example, and Unicode, the encoding used in an illustrative embodiment of this invention.
The Johab encoding system, as an example of a particular implementation of encoding a character set, contains 11,172 combinations which represent all the possible pre-combined Hangul. This encoding is described in Annex 3 of the KS X 1001:1992 standard. Only a fraction of these combinations represents real words, similar to encoding all possible three-letter words in English. The 2350 pre-combined forms, known as the “standard plane”, are a subset of all possible permutations. Johab encoding specifies a combination of up to 3 jamo, each using 5 bits, concatenated together, for a 15-bit combination. The 16th bit in the 2-byte word is reserved. The Unicode encoding standard maps these 16-bit combinations to other bit patterns.
Encoding systems enable electronic entry of language elements into a reduced keypad. As in English language data entry into such a reduced-size keypad, a disambiguation system is required. Further, efficient entry into a reduced-size keypad also requires intelligent placement of character elements on keys of the keypad in addition to user-friendly methods for candidate selection and word delimiting. For example, since Korean jamo are used commonly to make a limited number of syllables relative to the total possible combinations, as described above, and the frequency of the ordinary use of these syllables in particular positions in a word can be known, it is possible to employ for text entry a keypad that has fewer keys than the total possible number of jamo.
One possible electronic data disambiguation method for Asian languages allows for varying keypress timeframes that are used to differentiate between, for example in the Korean language, consonants and double consonants, or monothongs and diphthongs. By this method, the number of possible jamo from which the user could choose to construct a symbol is increased, but the necessity of a “soft key” (a control key used for delimiting and candidate selection) simultaneously limits the number of keys available for jamo placement.
It is an object of the invention to provide a system in which data entry keypress combines jamo selection, word building, and symbol selection that increases efficiency in the use of keypads for Korean text entry.