1. Field of the Invention
The present invention generally relates to the field of encoding characters of foreign languages with a defined set of symbols according to a defined set of rules. The combination of such a defined set of symbols together with the defined set of rules is often referred to as a coding system or coding scheme. The present invention relates more particularly to the field of the coding systems of Chinese characters.
2. Description of the Prior Art
Unlike many other languages which use a limited number of letters or characters (such as the English language which uses only 26 alphabetical letters), the Chinese language uses tens of thousands of different characters. According to an old Chinese dictionary published in 1915, there were approximately 48,000 Chinese characters, although more than two-thirds of these characters were rarely or no longer being used in daily dialect. A popular Chinese dictionary published in the People's Republic of China (mainland China) has collected approximately 11,000 Chinese characters which are still commonly used in modern Chinese language. Other Chinese word-processing dictionaries or handbooks published in Taiwan or Hong Kong, which are also parts of China, have collected anywhere from approximately 13,000 to approximately 16,000 Chinese characters.
The Chinese characters are identified by their graphic configurations and phonic pronunciations. Although many Chinese characters are homophonic characters, each Chinese character is distinctly identifiable by the combination of its configuration and pronunciation. In the following descriptions, the configuration of a Chinese character is referred to as its "picto" or "video" aspect, and the pronunciation of the same Chinese character is referred to as its "phono" or "audio" aspect. Described in such terms, it can be said that each
Chinese character is distinctly identified by the combination of its audio and video aspects.
The vast number of the Chinese characters makes the process of typing or word-processing of Chinese literature very difficult. This is because, unlike the typing or word-processing equipment for other languages such as English which can have a keyboard with only a limited number of letters or characters, it is virtually impossible to have a keyboard for the Chinese language which has keys directly corresponding to all the tens of thousands of Chinese characters.
Therefore, in order to process Chinese literature with modern typewriting or computer word-processing equipment having a keyboard with only a limited number of keys, the Chinese characters have to be encoded by a certain coding system which employs only a limited number of symbols. Under such a coding system, the code of the Chinese characters can be directly typed in by using a keyboard with only a limited number of keys.
Many efforts have been made in trying to encode Chinese characters so that they can be indirectly input into typing or word-processing equipment which have a keyboard with a limited number of keys. Since each Chinese character requires a distinct code, any coding system for Chinese characters would have tens of thousands of different codes for representing respectively all the different Chinese characters. Accordingly, a coding system for Chinese characters would have to be designed and constructed in a very intelligent manner, so that an ordinary user who knows how to read and write a Chinese character can know how to construct the code of the Chinese character. Otherwise, it would be virtually impossible for any ordinary user to use the coding system without always consulting the code book.
One of the major approaches of the existing coding systems for Chinese characters is to encode each Chinese character according to its ideographic configuration, or its picto-aspect. Such a coding system will be referred to as a "picto-coding system".
The configuration of each Chinese character is a combination of many pictographic strokes written in a particular sequence. One or more strokes may form a radical component of a Chinese character. Referring to FIG. 1, there are shown two Chinese characters denoted by C1 and C2 respectively. The first Chinese character C1 means "special" (as an adjective), "expert" (as a noun), or "monopolize" (as a verb). The second Chinese character C2 means "interest" (as a noun), "sharp" (as an adjective) or "benefit" (as a verb). Together they form a word that means "patent" (as a noun). The first Chinese character C1 has 11 strokes, and the second Chinese character C2 has 7 strokes. The numbers adjacent to the respective starting points of the strokes represent the correct sequence of writing these two Chinese characters.
The picto-coding systems use Arabic numerals 0 through 9 or the Latin alphabets with its 26 English letters, or other special symbols, to encode the strokes or radicals of Chinese characters. The main advantage of the picto-coding systems is that as long as a user remembers how to write a Chinese character and remembers the rules of construction, the user can construct the code to encode the Chinese character. Since the number of strokes are limited, the number of symbols required to represent the strokes are also limited. However, as it will be seen later, the rules of construction of the code are often very complicated because of the complexity involved in writing the Chinese characters. Oftentimes only extensively trained and highly sophisticated professional typists can use the picto-coding systems with acceptable speed.
There have been several representative picto-coding systems for Chinese characters. One early picto-coding system is known as the "Four-Corner" coding system. It was developed in China in the 1930's. It is rarely used now. In the Four-Corner picto-coding system, the various basic strokes of Chinese characters have been classified into 10 categories, each represented by one of the 10 Arabic numerals (0 through 9). Each Chinese character is then represented by four Arabic numerals corresponding to the respective strokes at the four corners of the Chinese character. The most significant advantage of the Four-Corner picto-coding system is that each Chinese character is encoded with a relatively small number (4) of Arabic numerals.
One of the main difficulties of using the Four-Corner picto-coding system is that many times it is hard to determine what the strokes are at the four corners of a Chinese character. Although the Chinese characters are basically square shaped, many of them do not have a distinguishable or recognizable "corner". Take the first Chinese character C1 in FIG. 1 as an example. It is hard to tell what the two respective strokes at the two lower corners of Chinese character C1 are. This is because the lower half of Chinese Character C1 includes a typical "centralized" radical component, and simply does not have easily distinguishable strokes at its two lower corners. This is a common situation in Chinese characters. The Four-Corner picto-coding system has made many complicated rules to deal with this type of situation, which makes it hard to use for an ordinary user. In fact, the users of the Four-Corner picto-coding system have to constantly consult the code book to successively encode Chinese characters.
Another picto-coding system is known as the "Five-Stroke" coding system. The "Five-Stroke" picto-coding system was developed in mainland China in the late 1970's. It is still one of the most popular coding systems currently used in mainland China. The "Five-Stroke" picto-coding system categorizes the various strokes of Chinese characters into five basic groups each represented by a representative stroke. There are several different definitions of the five representative strokes. One of the most popular sets of the five representative strokes includes a "horizontal" stroke, a "vertical" stroke, a "left-falling" stroke, a "right-falling" stroke, and a "turning" stroke.
Most Chinese characters consist of more than 5 strokes. For example, in FIG. 1, the first Chinese character C1 has 11 strokes, and the second Chinese character C2 has 7 strokes. The 11 strokes of Chinese character C1 include 5 horizontal strokes (Nos. 1, 3, 4, 5 and 7), 3 vertical strokes (Nos. 2, 6 and 10), no left-falling stroke, 2 right-falling strokes (Nos. 8 and 11) and 1 turning stroke (No. 3). The 7 strokes of Chinese character C2 include 1 horizontal stroke (No. 2), 3 vertical strokes (Nos. 3, 6 and 7), 2 left-falling strokes (Nos. 1 and 4), 1 right-falling strokes (No. 5) and no turning stroke.
Since many Chinese characters consist of more than 5 strokes, the Five-Stroke picto-coding system has to make a series of rules to regulate which five strokes of a particular Chinese character are to be chosen and encoded. The selection of the five strokes and the sequence of encoding them are very complicated and only extensively trained operators can efficiently utilize the Five-Stroke picto-coding system. Another disadvantage of the Five-Stroke picto-coding system is that it requires a specially designed keyboard for typing in the strokes into Chinese typing or word-processing equipment. The keys on such keyboard are not marked with Latin-alphabets but rather, the strokes or radical components of the Chinese characters. Therefore the typing or word-processing equipment with such special keyboard can only be used in processing the Chinese language.
Still another picto-coding system is known as the "Cang-Jie" coding system. The Cang-Jie picto-coding system was developed in Taiwan in the b 1970's and was named after a legendary figure from ancient China who is supposed to have first created the Chinese characters. The Cang-Jie picto-coding system is widely used in Taiwan and Hong Kong. It is also quite popular in the Chinese communities in the United States because it can be used on a conventional computer or word-processor with a standard English letter keyboard. Referring to FIG. 2, the Cang-Jie coding system classifies the various strokes of Chinese characters into 24 groups each represented by a so-called "Chinese Alphabet Component". Each Chinese Alphabet Component is assigned a corresponding English Alphabetical letter (except letters X and Z--X is reserved for conflict characters or difficult characters, and Z is reserved for user self-defined or self-created characters).
Because many Chinese characters have a high count in its number of strokes, it would be very slow if every stroke of such a Chinese character has to be encoded. For example, the first Chinese character C1 shown in FIG. 1 has 11 strokes. If every stroke must be encoded by the Cang-Jie code shown in FIG. 2, then a user would have to type in an 11-letter string such as "MLLMMLMIMNI" to input Chinese character C1. This is very ineffective in real practice. Accordingly, the Cang-Jie picto-coding system has a special rule, that is, the number of strokes selected for encoding each Chinese Character should be no more than 5. However, to enforce this rule, the Cang-Jie picto-coding system has to make a set of very detailed rules. Therefore, similar to the Five-Stroke picto-coding system, the Cang-Jie picto-coding system is very hard to learn and use for ordinary users, because it is often very hard to choose correctly the five strokes and then encode them according to the very complicated rules. For example, the official Cang-Jie code for Chinese character C1 is "JIDI". An unsophisticated user will have a hard time reconciling this with the code-table shown in FIG. 2 and understanding why "JIDI" is the correct code.
Because of the difficulties experienced in using the Cang-Jie picto-coding system, a simplified Cang-Jie picto-coding system has been developed. The simplified Cang-Jie picto-coding system uses the same 24 Chinese Alphabet Components used in the original Cang-Jie picto-coding system. However, to reduce the difficulty of selecting the strokes to encode, the simplified Cang-Jie picto-coding system makes a new rule. Under the new rule, only two strokes are selected and encoded, which are the first stroke and the last stroke of a Chinese character. For example, the simplified Cang-Jie code for the first Chinese character C1 shown in FIG. 1 is "JI" (in Cang-Jie code, the first horizontal stroke and the sixth vertical stroke are treated as one "cross" stroke which is represented by English letter "J").
Of course the simplified Cang-Jie codes are much easier to pick and encode. However, there is a new problem. The new problem is that many different Chinese characters share the same first and last strokes, just as in English many words share the same first and last letter. For example, when the simplified Cang-Jie code "SI" is typed into a Chinese word-processor which uses the simplified Cang-Jie picto-coding system, at least 15 different Chinese characters will show up on the display screen because all of them share the same simplified Cang-Jie code "SI", including the first Chinese character C1 shown in FIG. 1. In this situation the user has to look up on the screen, choose the right character and type in the numerical index next to it.
Therefore in using the simplified Cang-Jie picto-coding system, the user has to go through a second step to choose the right character. This "two-step" method locks the user's eye to the display screen all the time for almost every Chinese character. It tremendously slows down the speed of Chinese word-processing.
Another major approach of the existing coding schemes for Chinese characters is to encode each Chinese character according to its pronunciation, or its "audio" aspect. Such a coding system will be referred to as a "phono-coding system".
Each Chinese character only has a single syllable. Each syllable typically has two components: an initial and a final. The phono-coding systems are designed to use certain special symbols or Latin alphabet letters to represent the initials and finals of Chinese syllables. Special symbols or Latin letters representing initials are known as consonants, and special symbols or Latin letters representing finals are known as vowels. The main advantage of the phono-coding system is that the user only needs to remember the principle of how to encode a limited number of consonants and vowels of Chinese syllables. However, as it will be seen later, many Chinese characters have similar pronunciation but different configurations. Most of the time when a user inputs a phono-code, the word-processor will prompt a multiplicity of different Chinese characters with the same pronunciation, and the user has to choose the corrected one by typing an index numeral. Therefore, the main difficulty of the phono-coding system is that it often takes two indirect steps to input a single Chinese character correctly.
There have been two main phono-coding systems designed for Chinese characters. One early phono-coding system was invented in the early 1930's, known as the "Standard Chinese Phonetic Symbols" phono-coding system, or "ZHUYIN" phono-coding system. It is rarely used in mainland China. The "ZHUYIN" phono-coding system uses 37 specially designed symbols as consonants and vowels. The principle disadvantage of the "ZHUYIN" phono-coding system is that its 37 special symbols are artificially constructed and hard to remember. In addition, to use the "ZHUYIN" phono-coding system on a word-processing device, the keyboard of the word-processing device has to be exclusively built and marked with the 37 special symbols.
A more popular phono-coding system is the "Chinese Phonetic Alphabet", or "PINYIN", phono-coding system. It was officially promulgated by the Chinese government in 1956 and is widely used in mainland China now. The principal advantage of the PINYIN over the ZHUYIN phone-coding system is that unlike the ZHUYIN system which uses artificially constructed symbols to represent the consonants and vowels of Chinese syllables, the PINYIN system uses Latin alphabet letters to represent the same. Referring to FIG. 3, the PINYIN system has 23 consonants and 33 vowels. The PINYIN codes for the first and second Chinese characters C1 and C2 in FIG. 1 are "Zhuan" and "Li", respectively.
As mentioned earlier, many Chinese characters have identical pronunciation but different configurations and meanings. For example, there are at least 34 homophones having the same PINYIN code "Li" as the second Chinese character C2 in FIG. 1. When the PINYIN code "Li" is typed into a Chinese word-processor which uses the PINYIN phono-coding system, these 34 different Chinese characters will all come up on the display screen, and a user will have to watch the screen to choose the desired character, and type in its index number to finally input that character. This, again, makes the PINYIN phono-coding system a two-step method for encoding Chinese characters.
The following nine (9) patents are prior art references which are pertinent art to the field of coding systems for encoding foreign characters.
1. U.S. Pat. No. 4,096,934 issued to Kirmser et al. on Jun. 27, 1978 for "Method And Apparatus For Reproducing Desired Ideographs" (hereafter the "Kirmser Patent"). PA1 2U.S. Pat. No. 4,193,119 issued to Arase et al. on Mar. 11, 1980 for "Apparatus For Assisting In The Transposition Of Foreign Language Text" (hereafter the "Arase Patent"). PA1 3. U.S. Pat. No. 4,298,773 issued to Diab on Nov. 3, 1981 for "Method And System For 5-Bit Encoding Of Complete Arabic-Farsi Languages" (hereafter the "Diab Patent"). PA1 4. U.S. Pat. No. 4,379,288 issued to Leung et al. on Apr. 5, 1983 for "Means For Encoding Ideographic Characters" (hereafter the "Leung Patent"). PA1 5. U.S. Pat. No. 4,408,199 issued to White et al. on Oct. 4, 1983 for "Ideogram Generator" (hereafter the "White Patent"). PA1 6. U.S. Pat. No. 4,500,872 issued to Huang on Feb. 19, 1985 for "Method For Encoding Chinese Characters" (hereafter the "Huang Patent"). PA1 7. U.S. Pat. No. 4,544,276 issued to Horodeck on Oct. 1, 1985 for "Method And Apparatus For Typing Japanese Text Using Multiple Systems" (hereafter the "Horodeck Patent"). PA1 8. U.S. Pat. No. 4,559,615 issued to Goo et al. on Dec. 17, 1985 for "Method And Apparatus For Encoding, Storing And Accessing Characters Of Chinese Character-Based Language" (hereafter the "Goo Patent"). PA1 9. U.S. Pat. No. 4,684,926 issued to Wang Yong-Min on Aug. 4, 1987 for "Universal System Of Encoding Chinese Characters And Its Keyboard" (hereafter the "Wang Patent").
In the above cited nine (9) prior art patents, five (5) of them are related to encoding Chinese characters. They are the Kirmser Patent, the Leung Patent, the Huang Patent, the Goo Patent and the Wang Patent. The other four (4) prior art patents, namely the Arase Patent, the White Patent, the Diab Patent and the Horodeck Patent, are not related to particular encoding schemes of Chinese characters.
The Kirmser Patent discloses a method and apparatus for reproducing desired Chinese characters. The Kirmser Patent employs the Standard Chinese Phonetic Symbols, or ZHUYIN symbols. It utilizes a specially marked keyboard, where each key has a designated code which is one of the 37 ZHUYIN symbols. In the Kirmser Patent, each ZHUYIN symbol is designed to be used as both a phone-code and a picto-code. That is, each ZHUYIN symbol is either representing the phono or audio aspect of a Chinese character, or the picto or video aspect of the Chinese character. Each given Chinese character is encoded by two sequences of ZHUYIN symbols. The first sequence includes two ZHUYIN symbols to represent the pronunciation of the Chinese character, wherein the fist phonetic symbol is a consonant and the second phonetic symbol is a vowel. The second sequence immediately follows the first sequence and includes a series of ZHUYIN symbols to represent the strokes of the Chinese character. The second sequence is primarily based on the "Four-Corner" picto-coding system. The Kirmser Patent uses the ZHUYIN symbols, which are no longer familiar to many Chinese people, particularly to younger generations. In addition, a coding scheme of the Kirmser Patent for a given Chinese character often includes as many as five (5) or six ( 6) ZHUYIN symbols.
The Leung Patent discloses a method and apparatus for encoding Chinese characters. The Leung Patent method is based on a modified Five-Stroke picto-coding system. It includes a coding scheme using five (5) Arabic numerals, namely 1 through 5, to represent five types of strokes of Chinese characters respectively. Each Chinese character is encoded by a series of these five (5) Arabic numerals in the order of the strokes, wherein one numeral represents one stroke. However, many Chinese characters have a large number of strokes. By using the coding scheme of the Leung Patent, a Character with a large number of strokes must be represented by the same large number of Arabic numerals. To reduce the number of keys one must type in, the Leung Patent utilizes a special keyboard with five (5) keys designated with Arabic numerals 1 through 5 respectively, and all the rest of the keys designated with the most frequently occurring numeral combinations. The Leung Patent coding system is a picto-coding system. To construct a correct code, a user has to remember the exact sequence of the strokes, and determine how a long code can be segmented into shorter combinations. For example, the first Chinese character C1 shown in FIG. 1 has 11 strokes. In the Leung Patent, its code is probably "12511214124". If a user does not remember the correct sequence exactly, then the user cannot construct the correct code. In addition, the user has to determine whether the long code "12511214124" is segmented into "1-2511-21-41-2-4", or "1-251-121-4-12-4". Furthermore, even with the specially designed keyboard, the user has to type six times to input one Chinese character such as the first Chinese character C1 shown in FIG. 1.
The Huang Patent discloses a method for encoding Chinese characters. The Huang Patent is also developed from the coding systems utilizing Standard Chinese Phonetic Symbols, or ZHUYIN symbols. The Huang Patent method is a coding scheme wherein the code of each Chinese character includes three (3) parts: a phonetic part, a tone part, and an ideographic part. The phonetic part uses up to three (3) ZHUYIN symbols to represent the pronunciation of a Chinese character. The tone part includes one of the five tone symbols to represent the accent of the Chinese character. The ideographic part includes two (2) digits, wherein each digit is associated with two different corner strokes of the Chinese character.
The Goo Patent discloses a method and apparatus for encoding, storing and accessing characters of Chinese character-based language. The Goo Patent is based on the Four-Corner picto-coding system. It includes two parts: a first part representing one of the common radical components of Chinese characters which appears in the particular Chinese character, and a second part representing a balancing portion, i.e., the rest of the strokes, of a particular Chinese character. The balancing portion is coded under a modified "Four-Corner" coding method. The Goo Patent utilizes a keyboard which has a first section and a second section. The first section is provided with keys designating various common radical components of Chinese characters, so that the first part of the coding scheme for a given Chinese character can be typed in with a single key stroke. The second section is provided with numerical keys, so that the numerals of the "Four-Corner" code can be typed in.
The Wang Patent discloses a universal system of encoding Chinese characters and its keyboard. The Wang Patent is a Five-Stroke type of picto-coding system. Certain basic roots of Chinese characters are selected according to their frequency distributions. These selected roots are classified into 25 groups according to their internal links and compatible relationships. Each group is represented by a respective Chinese key name. The classified selected roots are then arranged on 25 keys of a standard keyboard. Each key designates a respective group of the classified selected roots. The Wang Patent can only be used on a word-processing device which has a specially designed Chinese character keyboard.
The remaining patents are not as close as the above discussed patents. The Arase Patent discloses an apparatus for assisting the translation of foreign language text. Character font sets of different languages, including Chinese, are preloaded into the disk storage of a computer and can be selected and displayed on a monitor screen for transposition. The Arase Patent does not disclose any particular coding scheme of Chinese characters. The White Patent discloses an ideogram generating system which includes a computer system. The White Patent relates to the improvement in displaying prestored ideographic characters on the screen of a monitor. It does not relate to any particular coding scheme of Chinese characters. The Diab Patent discloses a method and system for 5-bit encoding of complete Arabic-Farsi languages. The Horodeck Patent discloses a method and apparatus for typing Japanese text using multiple systems. Neither the Diab patent nor the Horodeck Patent relates to the encoding of Chinese characters.
It is desirable to have a "phono-picto" (or "audio-video") coding system which combines the advantage of the most popular phono-coding and picto-coding systems, wherein each Chinese character is represented by an audio-video code that is constructed exclusively by a limited number of English alphabetical letters, so that the audio-video coding system can be used with a standard English letter keyboard.