With the great development of the information and communication industries in the last decades there has arisen a growing need for fast, accurate and versatile mechanical processing of written material in all the major languages. As to processing Chinese texts with the same speed and accuracy as has been achieved with many alphabetic texts, the greatest obstacle has been the well-known fact of the great number and complexity of the Chinese characters. Apart from only historically used characters, a modern Chinese still uses about 8-10,000 characters, with 3-4,000 in everyday use.
The decoding of these characters, that is, the converting (back) of already generated electronic signals structured by a given system of character codes into the original visual forms of the characters (on CRT screens or by various types of printers at computer output terminals), as this can be done completely mechanically, has been the lesser problem, and at present there are already several quite satisfactory methods and equipments available for this purpose.
The encoding of such characters, however, has to be done by human (intelligent) operators, and the finding or devising of fast and accurate ways of encoding Chinese characters has been a persisting problem, as the great number of various attempts in the field, proposed year by year even up to now, clearly shown. The present invention also addresses this problem, and its overall object is to devise an encoding method (and a keyboard or other equipment suitable for the method) that is more congenial to the peculiar features of the Chinese language and therefore easier and faster than those in previous art.
In almost all the encoding systems used or proposed so far, the data that are entered on a keyboard are graphic or visual, usually graphic component parts of the characters, by which a classification or indexing of the characters is attempted. The great variety of these graphic components and of their location in the character wholes, however, makes it difficult to achieve a complete and yet not too complex classification. A few of the existing or proposed systems also use the phonetic data (the pronunciation) of the characters, in most of such cases the phonetic data being used together with the graphic data, in particular when the encoding takes place in two steps, the phonetic data to be used in the first or the second step.
No encoding system, it seems, has been proposed so far which would use merely the phonetic data. The reason for this is that, while the number of written characters is great, the phonetic "repertoire" (more exactly, the number of speech morphemes) of any given Chinese dialect is rather limited, and therefore homophone characters, different characters with exactly the same pronunciation, are very numerous in Chinese.
Equally important is, however, the fact that especially in the last 30-40 years much has been done by educational and cultural institutions to standardize the pronunciation of Chinese, in particular that of the official language, Mandarin Chinese. Almost all literate Chinese educated in the last 30-40 years know the "official pronunciation" of Mandarin Chinese and are moreover familiar with one or another (Chinese-style or alphabetized) system of phonetic notation or symbols by which the pronunciation is written down. The availability of standard phonetic notations, and the realization that there are speech patterns, explained below, by which Chinese characters can uniquely be defined, are the grounds for the further object of the invention to devise an encoding system based on phonetic data alone.
In the encoding systems proposed so far, whatever kind of data or keyboard they use, Chinese characters are identified and encoded singly, as discrete units, one by one. These methods try to find certain characteristics "inside" each character by which it can be distinguished from every other character. It is true that most Chinese characters in their origin have been idcographs, self-contained graphic representations of things or ideas, and there are grounds for treating and identifying them singly, in themselves. But in modern Chinese, especially in a running text, written sentence by sentence, the characters appear in groups or blocks, largely following the grammatical patterns of the spoken language. What we see in a modern Chinese text is not simply a row (or column) of individual characters but more often groups of characters, easily identifiable two-, three- or four-characters blocks, following one another. A further object of the invention is to utilize this feature of modern Chinese texts and make it a rule to encode Chinese characters, at least most of the, in principle, as blocks of characters, not one by one. The problem, then, of resolving the ambiguity of homophone characters shifts to that of homophone character blocks, which are very few in Chinese. And the still remaining ambiguities among homophone character blocks can be resolved and the desired characters identified by a method imitating a certain speech pattern in Chinese.
Careful speech in Chinese is most of the time unambiguous because in speech there is even more frequent use of longer blocks or strings of speech morphemes, paralleling the blocks of characters mentioned above, the meanings of which in most cases are unambiguous. Moreover, there are several speech patterns widely used by literate speakers of Chinese whenever the necessity arises to identify to the listener a certain Chinese character or characters. In the most common pattern, the speaker pronounces a longer string of speech morphemes (an expression of several syllables) which uniquely define a block of Chinese characters of the same length, and then he indicates that the character in question is the first, the second, the third, etc. among the characters in the longer expression just pronounced. This speech pattern is very common among literate speakers of Chinese, and a further object of the invention is to make use of this pattern as well, formalize it, extend its scope, and make it also one of the encoding rules whenever character blocks that have homophones are to be encoded.
A still further object of the invention is to provide, within its scope and the method described, an alternative embodiment in which an acoustic speech sound analyzer (a preliminary speech sound encoder) substitutes for the functions of the phonetic data keys, or even for those of other keys, of the keyboard. This speech sound analyzer is programmed to produce, after recognizing the individual speech sounds or whole speech morphemes, specific electronic signal strings that have the same distinctiveness and therefore the same identifying force as those which the actuation of the respective keys on the keyboard would produce. The limited speech morpheme range of the Chinese dialects actually makes their speech sound analysis much easier than that in the case of most other languages.
As conceived in the present invention, the encoding is done essentially at the keyboard by a variety of specific sequences of keystrokes, on speech sound keys and other special function keys, defined by the encoding rules (or by the articulate reading-in of the pronunciation of Chinese characters and actuation of some special function keys, all in specific sequences, if the acoustic speech sound analyzer embodiment is used). The sequentially coded signals generated by the keyboard (or those generated by the speech sound analyzer) are in every case specific enough to uniquely identify the character or characters to be encoded. The method and equipment disclosed in this invention have been designed to achieve this end; this end also defines the proper scope of the invention.
It is understood, however, that the keyboard (or the speech sound analyzer) is only a part of the whole data entry apparatus. That is, the invention relates only to a part or a stage of the whole data entry process.
In almost every practical application of the invention, the coded keyboard signals will have to be subsequently converted or translated, one by one, into another set of signals coded in the "character codes" (specific signals for each different Chinese character) used in that particular word processing, communication, display, printing, etc. equipment into which the data entry is made. This code conversion can be done, for example, by the conventional operations of an electronic data processing system (computer) appropriately programmed and provided with a memory section with a sufficiently large number of memory locations which are identifiable by the keyboard signals and in which the pertinent (finally required) "character code" data are stored. When the computer receives from the encoding keyboard one of the many previously determined possible coded signals, this signal serves as a memory address identifier, and then the computer's control section finds the address thus identified, retrieves the information previously stored there--which is one or several "character codes" (in the present method it is possible to encode more than one character by just one keystroke sequence)--and feeds this character code information into the equipment into which the data entry is made.
It is essential that the keyboard signals be specific enough so that they could be unambiguously converted into another set of signals structured by one or another kind of "characters codes," but whether or not these keyboard signals are converted and by what kind of computer operations such a code conversion is accomplished lie outside of the scope of the invention, as are the mechanical details of the keyboard or those of the above mentioned speech sound analyzer.
It is, finally, also an object of the present invention to accomplish the encoding in one step. According to the method and equipment described here, there remains no ambiguity (requiring further steps to resolve it) as to which character or characters are encoded once one sequence of keyboard actuntions has been completed. Ideally, therefore, the encoding is very fast. At the same time, it must be noted that in using this method and equipment a rather high level of literacy in Chinese, familiarity with character combinations in the language, and knowledge of the exact pronunciation of the characters is required on the part of the operators. Even well qualified operators may occasionally have to look up a dictionary or some character index or manual, as a previous step, before entering the data on the keyboard, but such occasions will be rare. To avoid mistakes in the encoding, the invention allows (and would even recommend) the use of a control monitor CRT screen which immediately displays to the operator the just encoded characters, but this is not an essential part of the equipment disclosed here.
In an overall assessment, the present invention requires a rather high level of literacy in Chinese on the part of the operator, but on the other hand it incorporates several important features of the Chinese language not utilized in prior art, and so it offers a new text-encoding tool faster, easier and more congenial to the language than other methods. Text data entry with this method and equipment is, in a way, like one literate speaker of Chinese "talking" to another. Also, this method and equipment can be a very convenient tool when one is encoding a text not yet written but to be freely composed at the keyboard. One can encode characters with this method even if he does not know (or is not sure of) their exact graphical composition, provided he knows their pronunciation.
Below is a detailed description of the encoding method and equipment. Some points of the description are clarified by drawings accompanying the text.