The present invention relates to a method of constructing language symbols, and particularly to a method of identifying and reproducing language characters of the oriental languages.
Oriental languages are generally characterized in that they include an exceedingly large number of written characters, and in that the character representations are generally derived from or are pictorial representations representing entire words or phrases. For example, the Chinese language contains more than forty thousand distinct characters, of which approximately ten thousand characters can be regarded as significant from the point of view of common usage.
To the extent that the Chinese language has been mechanically reproduced in printed form, the machinery for achieving this reproduction has historically consisted of large mechanical machines which can print individual characters in columns and rows using fonts that are accessed from a large "keyboard". To accommodate all of the language characters a number of separate fonts must be used and must be interchangeable with each other on the machine for effecting the mechanical reproduction of characters.
With the advent of computer technology a variety of techniques have developed for electronically processing the Chinese language. In general these techniques have involved the use of more or less "standard" keyboards normally associated with computers. One technique involves assigning several identities to a number of different keys, and specific characters are accessed from memory by depressing individual keyboard keys in predetermined sequence. This provides coded instructions to the programmed logic regarding the specific character to be reproduced. Another technique utilizes Roman characters as the input to programmed logic, and yet another technique employs coding by numbers. These techniques are typified by the so-called cang jie method, the method of first/last stroke plus phonetic discriminator, the romaji conversion method, and the telegraphic code method.
Using the cang jie method, the operator must memorize the keyboard of the computer processor, including the cang jie representations of character fragments and must learn which keys each of the representations are on. Each key typically serves more than one cang jie symbol. When using the machine the operator mentally separates characters to be processed into their component fragments, and then obtains the desired character by pressing keys for cang jie equivalents of character fragments in the proper sequence, as though he were constructing the character by hand. The machine uses programmed logic to interpret the input key strokes and converts them to the desired characters. Selection by the machine of the proper character is facilitated by programmed grammar, and the operator reverts to use of a numerical code table if the cang jie form is unknown or is nonexistent.
The first/last stroke plus phonetic discriminator method requires that the operator memorize the computer processor keyboard, and in particular which strokes are available and which keys represent the strokes. Also, the operator must learn which phonetic symbols are available and which keys the phonetic symbols are associated with. When using the machine, the operator examines the characters to be processed, and identifies the first and last stroke making up each character. He then must know which characters can be distinguished on the basis of first and last stroke only, and which ones need a phonetic discriminator. The operator enters a character by pressing appropriate keys for the first and last stroke and the phonetic discriminator, if required, in the proper sequence. The computer processor uses programmed logic to interpret the input key strokes and converts them into a complete character. Selection of the proper character is facilitated by programmed grammar rules, and the operator reverts to use of the numerical code table if the strokes and phonetic symbols are insufficient for extracting the proper character from memory.
The romaji conversion method requires that the operator memorize both the romaji keyboard and the Romanized spelling for all of the sounds of the oriental language. For example, the Chinese language has 409 or 410 sounds. The People's Republic of China utilizes four hundred nine sounds in conversation, whereas the traditional Chinese language form used in Taiwan and elsewhere employs four hundred and ten sounds. When using a machine based on romaji conversion, the operator must examine the Chinese language document to be processed, mentally convert the written characters into sounds, convert those sounds into the romaji form, and then enter the romaji letters in proper sequence using an alphabetic or similar type of keyboard. The machine uses programmed logic to interpret the romaji letter sequences and convert them to the desired characters. Selection of the proper character is facilitated by programmed grammar rules, which is an essential part of romaji conversion using a keyboard input. The operator has no need for numerical code tables, since all characters can be expressed in romaji.
The numerical code method has been used for many years in the field of telegraphy. The operator must memorize code numbers for specific characters in order to achieve proficiency in the use of the technique, and those characters that are not memorized by the operator must be found using numerically coded tables. An input into a computer processor is made via numerical keys, and the processor uses programmed logic to interpret the input key strokes and convert them to characters.
In the transition from mechanical "typewriters" to electronic character processors, some use has been made of what is generally called the tablet/stylus technique. This consists of a tablet with a fixed display of commonly used characters on its surface, and a stylus for identifying the location of desire characters. It has the advantage over the old Chinese mechanical typewriter in that it is smaller in size, but the number of characters displayed is limited by the physical size of the tablet, and the physical size of the tablet in turn is limited by how many characters can be usefully displayed without completely confusing the operator. An electronic stylus is used in place of a mechanical positioning device, and it provides an electrical signal that identifies X-Y coordinates on the tablet to the computer.
Several of the methods described above employ programmed grammar rules, either as an essential element or as an important aid for facilitating character selection. This is a major shortcoming of keyboard operated processors for the Chinese, Japanese or Korean languages, because programmed grammar rules cannot accommodate jargon unless specifically programmed. Jargon is defined as the body of colloquial expressions which have come into practice as languages have evolved and which may vary in meaning from one segment of society to another. Machines will obey rigid rules of grammar, but people generally do not. Jargon is widely used in every field from computers to medicine, and varies from one region to another even within a single country. A machine which is dependent upon use of proper grammar for efficient operation can be nearly paralyzed by jargon. Stated another way, the preparation of documents in any specialized field where jargon is essential to communications will significantly reduce operator efficiency.
The romaji conversion method described above depends upon programmed grammar rules, and so is intolerant of jargon. Even more important, however, is the fact that message distortion will usually result from typographical errors caused when the romaji conversion method is used. A machine operator working in the English language can and will make typographical errors, but this does not usually destroy the sense of what is being communicated. With romaji conversion processing of the Chinese language however, a typographical error can change the meaning of a word dramatically. For example, an operator intending to type the romaji form "hezi" could easily strike the wrong one of two adjacent keys, producing the Chinese character form of "gezi" and changing the intended meaning while still retaining grammatical continuity. The word "hezi" has reference to a nuclear device, whereas "gezi" relates to a dove, the symbol of peace.
From the above summary of existing language processing methods it is evident that computer processing can and has been used to enhance the practice of these methods. Because of the unique speed capabilities of computers, they are a valuable tool in facilitating the selection and reproduction process of constructing oriental characters because they can very rapidly apply a large number of known and predetermined rules to this process, and can store a large volume of character-related information. Through the use of modern matrix type printers and CRT display screens, the outputs of these devices can be used to reproduce a hard copy of pictorial symbols, or the output can be used as a character generator for word processors or computers. A conventional input device to a computer processing system is a keyboard, which is a severe handicap when dealing in a language having thousands of characters because the number of characters greatly exceeds the practical limit for keyboard size. To address this problem the present invention contemplates a method which most effectively works with a dynamic "keyboard" wherein the identity of the respective "keys" is a dynamic function of the respective method steps being performed at the moment.
Whereas the steps of the method can be performed utilizing a wide variety of machine and/or non-machine constructions, they are most efficiently performed in combination with a machine having a cathode ray tube (CRT) utilizing a touch screen construction for interfacing with a computer processor. The advantage of a touch screen device for use in performing the method is that the user/machine interface is achieved by touching the various X-Y coordinate positions physically fixed on the screen, whereas the visual display shown on the screen may be selectively manipulated to place different character options at the various X-Y intersections. Therefore, depending upon which of the method steps is being performed at any instance, the screen display may indicate a plurality of character choices and the operator may touch the X-Y coordinate corresponding to one of these choices to initiate the next step of the language processing method. In this manner, the finite area of a touch screen may take on many different meanings in the course of performing a multi-step method but each display associated with any particular step may contain only a finite and readily understandable body of character information for consideration by the operator. When used in conjunction with a phonetic approach to language construction all of the character options for sounds being processed may be displayed, thus accommodating even the most specialized topic, jargon or slang. Further, the use of a phonetic approach to the generation of language characters also enables the use of voice recognition techniques for processing phonetic sounds, which can greatly speed up the computerized practice of the method.