The present invention relates, in general, to a system for producing in text form a manuscript which is to be written in a language utilizing symbolic characters. More particularly, the invention relates to the method of and to electronic equipment for carrying out such a procedure through the use of a unique identifier code which is generated to identify selected aspects of each character in the text. The identifier code so produced operates to select one or more previously stored characters for use in reproducing the manuscript characters in a text form for display or printing, the system thus effectively comprising an electronic typewriter for such characters.
The use of ideograms and logograms as the graphic symbols in written languages is found in many parts of the world. An ideogram is a graphic symbol used to represent an object or an idea without expressing, as in a phonetic system, the specific sounds which form the name of that object or idea. Thus, it is a symbol representative of an idea, rather than of a word. A logogram is a letter, character, or other graphic symbol used to represent an entire word. The use of logograms and ideograms is typified by Chinese, Japanese, Korean, and like languages, but for purposes of illustrating the concepts of the present invention specific reference will be made herein to a preferred embodiment of the system and method as it applies to the Chinese language.
Among the world's writing systems, Chinese orthography stands out because phonetic representation is a minor factor in its construction. There is no alphabet or syllabary from which Chinese characters are built, in contrast to other written languages, such as English, which employ alphabets having a relatively small number of digits or letters which are arranged in specific sequences and directions to permit classification of the words on the basis of the letters' conventional locations in the alphabet. As a result, alphabetically written, in contrast to symbolically written languages, are amenable to type-setting, typewriting, telegraphy, and sorting through assembly and disassembly of the letters. Further, the arrangement of the letters in alphabetically written languages is often phonetic so that the sound representation can be deduced from the particular arrangement, while only a hint of sound representation can be deduced from Chinese characters, and that only after one has learned a considerable number of them. As a functional writing system in modern Chinese, the characters can best be described as discrete units, or ideograms, which represent specific meanings. They can be learned by rote and can be retained in the memory only by frequent use. A repertoire of between 2500 and 3000 ideographic characters is necessary to achieve normal business adequacy in reading and writing, while the language itself has approximately 50,000 characters that have been identified historically, with about 10,000 characters being in current use.
Traditionally, the Chinese characters are classified by their shapes, not by the correspondence to linguistic forms. Accordingly, the problem of reproducing the characters mechanically has been extremely difficult, and it has been virtually impossible to derive adequate indexing methods. Each character contains one or more of some 214 meaning classifiers or radicals, with further classification being by the number of penstrokes in the remainder of the character. Further, the radicals themselves are classified by the number of strokes in them, but these are meaning classifiers, and do not ease the problems discussed above.
Because there is no straightforward system for indexing characters by their relation to elements of the language, the technology for printing has stayed at a rudimentary stage in the Chinese language until very recently. Although movable type was invented by the Chinese, the very nature of their writing system hindered any technical advance beyond the use of hand-set type or hand-drawn reproduction of characters. The origins of the Chinese system of writing can be traced back six thousand years, but the efficient use of modern communications and data processing systems has effectively been blocked by the problem of rapidly locating the desired character or characters to be printed. An early example of this problem appeared with the development of telegraphy, for in order to transmit messages it became necessary to assemble a telegraphic code which consisted of the International Morse Code combinations for the numbers 0 through 9,999 which were used as labels for 10,000 of the 50,000 Chinese characters. The "Telegraphic Code" was published, and the telegraph book was used by both the sender and the receiver of a message. The sender looked up each Chinese character in turn and transmitted the Morse Code representation of the number assigned to that character, while the receiver used the same book to reconvert the number to the Chinese character. Such a slow and painstaking method of transmitting a Chinese text, and the equally slow method of printing by the use of hand-set type or the use of hand-drawn pages of characters has resulted in numerous attempts over the years to develop more satisfactory solutions.
Among early attempts at solving the foregoing problems were mechanical typewriters which attempted to provide a mechanical keyboard arrangement for reproducing selected ideographic characters. Such typewriters, however, typically are nothing more than small manipulators for lead type wherein an operator sits before a case of several thousand type slugs arranged by radical and stroke count. The operator searches through the display of characters, which may, for example, be identified on a large and complex keyboard, and uses a pointer/printer linkage to retrieve the desired slug, print the character, and return the slug to its tray. A great deal of practice is required to achieve some degree of facility with such a machine; a maximum speed of about eleven characters per minute can be attained, with normal type speeds being in the range of five or six characters per minute. Although many attempts have been made to improve the mechanical typewriter, as by providing machines which will print certain strokes and radicals so that the characters can be mechanically constructed, nevertheless, the very nature of the Chinese ideogram prohibits effective mechanical reproduction by means of a typewriter. Similar problems exist with the written forms of other languages which similarly utilize graphic symbols rather than an alphabetical representation of words.
In an attempt to overcome some of the problems presented by the Chinese language ideographs, a phonetic system of spelling Chinese syllables through the use of a romanized alphabet was devised, and has been widely promoted in China. This phonetic spelling, known as the pinyin system, is based on the sound of the spoken Chinese syllables. However, because Chinese syllable structure allows a limited number of possible sound combinations, a single syllable sound is ambiguous in that it will usually identify a large number of characters. This presents little problem with the spoken word in conversation, since the intended meaning usually is apparent from the context or from particular word phrases and compounds. But because of the ambiguity as to which character is meant by a particular syllable sound, the introduction of the pinyin system and other like phonetic systems for languages other than Chinese did not solve the problem of reproducing specific ideographic characters in a manuscript by a typewriter.
With the advent of computer technology, it was recognized that a new tool had become available for use in the fast and accurate production of Chinese ideograms. Accordingly, various research and academic institutions, companies, and individuals have for many years worked on the development of electronic data processing machines and methods for producing Chinese characters. At the present time, this art has been developed to the point where computers can generate adequate ideographic shapes, and sophisticated character generators and hard-copy printing units have been developed that have the flexibility to produce acceptable Chinese characters with high resolution. Various optical readers, matrix systems, and expanded memory storage systems have made it easy to store in a data processing system the information necessary to reproduce a specified Chinese character. But even with such developments the essential problem of selecting which character should be printed or displayed remains a major stumbling block. In a typewriter system where it is desired to transfer a manuscript document to printed form, for example, the problem still remains that there are some 50,000 Chinese characters from which to select, and there has been until now no convenient, accurate and rapid method or apparatus for identifying a particular character, locating it in the processing system memory, and causing the correct character to be printed. A number of approaches have been suggested in the prior art and some have been marketed, but none has provided a satisfactory typewriter operation.
One approach has been to provide a device that stores standard character particles in a memory. An operator then uses coded sequences on an alpha-numeric keyboard to assemble the desired characters on a particle-by-particle basis on a cathode ray tube. After completion of the assembly procedure, the displayed character can be reproduced on a hardcopy device. Essentially, this approach is an electronic reproduction of the pen or brush technique wherein each part of a character is constructed by hand, one stroke or one radical at a time.
Another approach has been simply to copy electronically the type tray and movable arm technique of mechanical typewriters. In this arrangement, a character table is displayed on a tablet surface, the operator hunts for the character which is required, and then touches that character location on the tablet with an electronic pen to produce the character code. This code is then fed to a computer and results in the printing of the selected character. However, this is a "hunt-and-peck" process which does hot facilitate speedy typing.
A recent approach to the problem of typing Chinese ideographs is discussed in U.S. Pat. No. 4,096,934 to Kirsmer et al., in which a computer is employed to store a catalog of Chinese characters. The characters are retrieved by means of a completely phonetic indexing system in which an ideograph is identified by spelling the pronunciation and/or by using the phonetic symbols themselves to describe the geometry of the character or parts of the character or to describe meanings of the character. All the standard Chinese characters are described phonetically, and this information is stored in the computer. However, a single phonetic word does not uniquely describe a single Chinese character, so a second sequence of phonetic symbols is provided to describe the shape or some descriptive characteristic of each character. To recover a specific character, then, two sequences of phonetic symbols are required. If that still does not identify the desired character, then additional sequences of phonetic symbols representing either the appearance of or the pronunciation of brush strokes or radicals must be encoded. This process, which requires plural encoding steps to recover a single character, is extremely complex and time consuming, and thus does not meet the need for a simple, accurate and rapid typing method.
Still another approach has been to utilize the existing mechanical typewriter, while adding the capability for producing a paper tape having optical markings that correspond to the mechanically selected type characters. The resulting tape can then be scanned electronically to produce a code which may then be fed to a computer for electronic generation of character displays or for operation of a high-speed printing device. Although this system allows faster reproduction of the typed material, the process of selecting the characters to be typed remains the same; namely, slow and tedious.
In an effort to reduce the time required to identify to a character generator the particular ideogram to be reproduced, so called "four-corner" coding schemes have been developed which attempt to classify Chinese characters by the particular shapes which appear at each of the four corners of the character. These four shapes can then be used to identify and retrieve characters from a computer memory. This approach is similar to the above described procedure of constructing desired characters through the selection of character particles, and to a more recent approach which uses a three element character construction scheme using a one-hundred radical keyboard. Such systems of identifying Chinese characters by selecting only portions of the character have a serious and common fault: even with very sophisticated coding systems, the use of only selected portions of a character for identification purposes does not uniquely identify a single Chinese character every time. This is because there are many characters which have the same general stroke or radical configurations on their periphery, but have different shapes at the center position so that the use of the so-called "four corner" or "three corner" codes have always resulted in ambiguities which have prevented effective use of such systems.