This invention relates to an apparatus and a method for encoding and decoding Chinese characters. More particularly, this invention relates to an apparatus and a method for encoding and decoding Chinese characters wherein a Chinese character is analyzed to determine the initial order of occurrence and the number of occurrence of each of three (3) predefined basic stroke elements into which the strokes of the character are converted in a sequence according to the Chinese handwriting rules, and a multiple element code indicating such initial order and number of occurrences of each of the basic strokes is derived for the character. In most instances, the code derived in the foregoing manner will correspond to more than one character. In such instances, an additional code element is added to the code to obtain an extended code which uniquely defines the character. When used in conjunction with an appropriately programmed computer system, the apparatus and the method of this invention enables rapid and efficient encoding and decoding of Chinese characters, thereby enabling the user to enter, store, display, process, retrieve, print or otherwise output Chinese characters in a variety of applications, such as word processing, electronic dictionary or character verification, printing, electronic publishing and the like.
The Modern Chinese Dictionary 26th Edition), as published by the Commercial Publishing Company, Hong Kong, contains more than 7000 words, each defined by one or more characters. Obviously, a language having such a large number of characters poses difficulties in terms of written communications, especially for those who have not acquired a high level of proficiency in the written Chinese language.
In addition, a single Chinese character may contain from one to over thirty strokes. The order or sequence in which the strokes of a character are written or drawn by hand is dictated by the Chinese handwriting rules which are well known to those skilled in the written Chinese language. Furthermore, to achieve a uniform appearance in printed or written characters, the vertical and lateral dimensions of each character should be approximately the same regardless of the number of strokes in the character. For example, the Chinese character for the word for "sun" ( ) has four strokes, and the Chinese character for the word for "chicken"
has 20 strokes. However, these two words when printed or written must ordinarily have the same vertical and lateral dimensions. Therefore, the strokes for the character
must necessarily be smaller than those for the character ( ) when those characters appear in the same body of text. In other words, different Chinese characters in the same body of text may require different stroke sizes. This requirement presents a further problem in the creation of a system for encoding and decoding Chinese characters.
Various systems for encoding and decoding Chinese characters have previously been suggested. For example, U.S. Pat. No. 4,559,615 to Goo et al. discloses a method and an apparatus for encoding, storing and accessing Chinese characters, in which the Chinese characters are analyzed in part according to the so-called "Four Corner Coding Method" to obtain a 7-digit code number corresponding to each character. However, the Four Corner Coding Method is complex, and therefore the method disclosed in Goo et al. is difficult to apply if the character being analyzed does not contain a clear-cut radical or if the corner strokes of the character are not well defined. It would therefore be desirable to have a technique for encoding and decoding Chinese characters which does not have the complexity and problems associated with the Four Corner Coding Method.
Accordingly, it is one object of this invention to provide for encoding and decoding of Chinese characters without using the Four Corner Coding Method. It is a feature of this invention that the strokes of a Chinese character are converted into basic stroke element of three (3) predefined types, the conversion taking place in a sequence determined at least in part by the Chinese handwriting rules The characters may be represented by a multiple element code indicative of the initial order of occurrence of each of the different types of basic strokes and the total number of occurrence of each type of basic stroke in the character being encoded. In some instances, an additional code element is added to such code to obtain an extended code which uniquely corresponds to the character being encoded.
U.S. Pat. No. 4,718,103 to Shojima et al., U.S. Pat. No. 4,718,102 to Crane et al. and U.S. Pat. No. 4,284,975 to Odaka are each directed to the use of pattern recognition techniques to encode and decode Chinese characters. However, to employ the techniques of Shojima et al., Crane et al., or Odaka, the user is required to enter graphical patterns of the characters or their stroke components, which are then matched against pre-stored templates or reference patterns for a set of characters or stroke components. Furthermore, entry of graphical patterns, typically through a stroke registration device, is often difficult to accomplish, since the level of skill in writing Chinese characters and the writing stroke style may vary from user to user.
Similarly, U.S. Pat. No. 4,829,583 to Monroe et al. and U.S. Pat. No. 4,755,955 to Kimura et al. disclose the encoding and decoding of ideographic characters using coordinate values related to the strokes of the characters being encoded or decoded. However, the techniques disclosed are difficult to use and highly dependent upon the user's skill in determining the coordinates of the strokes of Chinese characters as normally written, since the stroke coordinates which are entered for a character must closely match the stroke coordinates of stored reference characters. It would therefore be desirable to have a technique for encoding and decoding Chinese characters which does not require the user to have a high level of skill in the written Chinese language and which avoids the matching of the stroke patterns or stroke coordinates of an encoded character with stored stroke patterns or stroke coordinates of reference characters.
Accordingly, it is another object of this invention to provide an apparatus and a method for encoding and decoding Chinese characters which do not require the user to possess a high level of skill in the written Chinese language and which is not based on the matching of stroke patterns or stroke coordinates of a character being encoded with those of reference characters.
U.S. Pat. No. 4,462,703 to Lee and U.S. Pat. No. 4,379,288 to Leung et al. are both directed to techniques for using a conventional keyboard to represent the component strokes and roots of Chinese characters. These techniques require a user to strictly follow the stroke sequences of characters dictated by the Chinese handwriting rules in encoding the characters. Similarly, U.S. Pat. No. 4,689,743 to Chiu discloses a technique for encoding and validating an ideographic character, such as a Chinese character. To encode a character, Chiu requires that each component stroke of the character be entered into the Chiu apparatus in the correct sequence according to established handwriting rules for such characters. However, this is difficult to accomplish for a user who does not possess a high level of proficiency in the writing of ideographic characters. It would therefore be desirable to provide a technique for encoding and decoding ideographic characters, such as Chinese characters, which does not require the user to know the proper sequence of every stroke of a character being encoded.
It is another object of this invention to provide an apparatus and method for encoding and decoding Chinese characters in which the stroke sequence of the Chinese handwriting rules need not be rigorously followed except for the first few strokes of the character being encoded or entered. It is a feature of this invention that the conversion of the strokes of a character being encoded to predefined basic stroke elements of three types need follow the sequence dictated by the Chinese handwriting rules only until two different types of basic stroke elements have been encoded. Thereafter, any remaining strokes of the character may be converted to the basic stroke elements in any arbitrary sequence. In this manner, rapid entry and retrieval of Chinese characters to and from a database system may be achieved by a user having a relatively low level of skill in the written Chinese language.
U.S. Pat. No. 4,669,901 to Feng and U.S. Pat. No. 4,684,926 to Yong-Min also disclose using keyboard means for encoding or entering Chinese characters. The Yong-Min technique uses five basic strokes and selects roots according to their frequency of occurrence distribution. The Feng system includes a keyboard having keys representing selected strokes, and combinations of strokes, radicals and other character components. However, neither Feng nor Yong-Min discloses a technique for encoding or decoding Chinese characters in which the characters are first wholly or partially converted into a sequence of predefined basic stroke elements types, and the encoding or entering of characters based on such conversion.
It is another object of this invention to provide an apparatus and method for encoding and decoding Chinese characters which do not require determining stroke frequency or stroke combinations of a character for purposes of encoding and decoding the characters. This invention advantageously enables encoding and decoding by means of determining the initial order of occurrence and number of occurrences of only three basic stroke element types for each character whose strokes are being converted into basic stroke elements of the three types.
It is yet another object of this invention to provide a method and apparatus for encoding and decoding Chinese characters which enable rapid and efficient entry, storage and retrieval of characters from a database system.
Other objects, features and advantages of this invention will be apparent from the following detailed description of exemplary embodiments, together with the accompanying Figures.