This invention relates to a symbol definition apparatus.
This invention has particular but not exclusive application to the redactive processing of symbolic language characters using electronic data processing apparatus, and for illustrative purposes reference will be made to such application. However, it is to be understood that this invention could be used in other applications, such as shorthand, mathematical, musical and other non-language symbols.
The written Chinese language is an example of a symbolic language, rather than an alphabetic language, and contains many thousands of symbols or ideographs, each of which may represent a word. All Chinese characters are formed from or include selected unique sequences of eight basic indicia or "strokes", derived from the actual brush strokes used when creating the characters by hand. Each stroke type is characterized by its shape and direction, and over fifty strokes may be used to form the most complex characters, although most characters can be defined with between six and twenty-one strokes. Each character has a particular stroke count number associated with it, although the official stroke count number can vary from the actual stroke count number in a small number of cases. In order to promote uniformity of character shape for different writers, as well as enhancing fluency, ease of redaction and teaching, strokes are added in sequence to form a character in a particular order which is defined for that character, although the actual stroke order is not apparent in the completed character.
Characters include one or more of a group of two hundred and fourteen basic character portions called "radicals" which are used with additional strokes to form characters with meanings or sounds related to that of the contained radical. Radicals may also be characters.
Because of the necessity to define such a large number of characters for writing in Chinese, the adoption of mechanised writing of Chinese has been slow. For instance, a typical Chinese typesetting machine has one thousand, two hundred keys, compared to approximately one hundred for an alphanumeric typesetter, and consequently is much more costly and difficult to use. It is also unable to type-set many of the rarer characters.
A further problem confronting users and students of the written Chinese language is the difficulty of finding the exact shape and meaning of a character using a dictionary. The primary access mode for a Chinese dictionary is by reference to the official number of strokes for that character. A user must then search the section of the dictionary devoted to characters with that official stroke count number, using the logical radical contained within the character to further sub-divide the search category. Hitherto, it has been impossible to utilise a progressive search strategy of the type which facilitates a search for a word in an alphanumeric dictionary.
Automatic or electronic word processing using Chinese characters presents a similar problem as described above, because hitherto it has been very inconvenient to select and enter the desired characters into the word processor using a keyboard. Some Chinese word processors utilise digitising pads as an input device, and the operators must draw the characters one at a time on the pad. Digitising pads are expensive and require considerable digital processing for the scanned image on the digitising pad to be recognised by the word processor as the character which was intended. Furthermore, the user of the digitiser must be skillful and careful to be consistent in the way that characters are drawn on the digitiser.
One current Chinese word processing technique requires an operator to break a character up mentally into a phoneme or series of phonemes selected from thirty-seven phonemes which are displayed on a keyboard overlay. The phoneme series is entered through the keyboard, and software displays on the display screen all characters to which that series of phonemes may apply. Unfortunately, because a great many characters do represent spoken words that sound the same, and Chinese is not a phonetic language, the operator must often select the desired character from a large number of characters displayed. There are approximately four hundred sounds in the spoken Chinese languages each with up to four. This limits the number of characters which can be used to approximately 500 or 1,000. In addition, while the form of Chinese characters is uniform throughout the Chinese-speaking world, the pronunciation may vary because of regional dialects, rendering the system unreliable for users other than speakers of dialects such as Mandarin for which suitable phonetic software is available. Furthermore, there are some Chinese characters which do not contain any phonemes, and so, such characters cannot be used in such a system.
United Kingdom Patents 2,066,534 and 2,118,749 disclose Chinese writing systems which employ fragments of strokes or "elements". Since Chinese characters are not formed by writing such elements in a particular order, a system of element entry cannot identify a Chinese character. It is the strokes, and more particularly the order of the strokes, that are written when forming a Chinese character, and the order is unvarying and known to all writers of Chinese. Even if a fixed element order is established, such a system leads to ambiguities because of the relatively large number of characters that might have the same element sequence, whereas the number of Chinese characters having the same stroke order is relatively small. Chinese characters with the same stroke order might be considered equivalent to homonyms in the English language, and of the 13,056 commonly used Chinese characters, only three hundred and twenty may be classed as homonyms, including twenty seven examples where there are three characters with the same stroke sequence and five examples where there are four characters with the same stroke sequence.
United Kingdom Patent 2,116,341 uses the 214 radicals and the number of strokes in a character to identify such a character. This system is prone to the difficulties that not all Chinese characters can be represented, and the system requires skill above and beyond that of the normal writer of Chinese.
United Kingdom Patent 2,125,197 discloses a method of encoding Chinese characters which entails dissembling each character into four constituent components, each representing a sound. With the large number of Chinese dialects, and the differences in pronunciation from one dialect to another, this method cannot be universally applied by any writer of Chinese. Australian Patent 532,185 discloses a keyboard using the well known "Ping" method of Chinese character index notation, which groups components of Chinese characters substantially in accordance with the nature of the first stroke of each component. This method has the disadvantage of requiring a very large keyboard with a very large number of keys, making the method difficult to implement.
United Kingdom Patent 2,062,916 discloses an automated method of Chinese character production using shape identifiers for the four corners of the basically square Chinese character. Such a system has to overcome the great difficulty of the large number of characters with similar shape elements, and is not practical in use.
United Kingdom Patent 2,060,231 makes use of a system of radicals, or "roots", numbering 256, but cannot represent a large number of Chinese characters, and has to accommodate the problem of selecting the desired character from a large number of characters containing a limited number of roots.
U.S. Pat. No. 4,684,926 discloses a method of depicting Chinese characters using 5 elements, or "strokes". The "strokes" of that invention do not represent the strokes normally applied in making up a Chinese character, and so character definition is made difficult. The "strokes" are described as a "topological pattern", and represents an entirely contrived system of Chinese writing taking no recognition of the traditional form of creating Chinese characters. Such a system requires the writer of Chinese to learn a new system of writing.
U.S. Pat. No. 4,500,872 discloses a system of Chinese writing using phonemes, and such a method cannot be universally applied by writers of Chinese because of the many varied differences in pronunciation, and furthermore, the very limited number of Chinese characters which contain any phonetic content. Chinese is not a phonetic language.