This invention relates to a method for encoding Chinese language characters.
One of the major problems encountered over the years respecting the handling (i.e. printing, data-transmitting, etc.) of the Chinese language, has been the lack of a convenient, simple and easy-to-use input interface, such as a keyboard, for an operator. The problem results primarily from the fact that the Chinese language includes many thousands of different characters which cannot, in any reasonably practical way, be accommodated uniquely (i.e., on a one-to-one basis) on a keyboard, or other like unit. This language-handling difficulty, while present for scores of years, is felt far more acutely now in an age of high-speed data transmission utilizing digital computers. In modern-day practice, it is important that some simple and speedy method of inputting unique Chinese character information be found if users of Chinese are to have practical access to state-of-the-art data-handling and transmission systems.
In recent years, several encodation approaches have been proposed to resolve the dilemma outlined above, and the method embodied in the present invention is one which is designed to offer major improvements over such earlier-tried encoding formats.
Expressed in the simplest terms, a main thrust of the present invention is to provide a unique method for encoding the different characters in the Chinese language which can encompass, without ambiguity, the maximum number of such characters, with the minimum number of encoding steps.
An important object related to this thrust is to provide such a method which is usable in connection with a structurally conventional keyboard unit, such as a standard so-called ASCII keyboard, to generate, for each encoded character, an electronic data stream usable with a data-handling device such as a digital computer.
The particular method proposed by the invention is capable of using such a keyboard, and in fact, and in connection with this type of keyboard, requires the use of significantly less than all of the standard keys made available therein.
According to the invention, three different code "tags" are assigned to each character--one being a phonetic tag formed with no less than one and no more than three of the recognized Chinese phonetic symbols, another being a tone tag which takes the form of but a single one of the five different recognized Chinese tone symbols, and the third being a two-digit number tag, each digit of which is associated, respectively, with two different ones of the four recognized corner configurations in the selected character. These three tags are then assembled in a preselected order, and according to a preferred method of practicing the invention, are utilized to generate a combined electronic data stream of the type feedable to a device such as a digital computer.
While there are several different orders in which these three different kinds of tags can be assembled, the one which seems to be the most preferable, from the standpoint of user convenience, is as follows: phonetic tag, tone tag, then number tag. Another assembly order which appears to be nearly as convenient is one which begins with a number tag, followed by a phonetic tag, and finally a tone tag.
Addressing somewhat more specifically the three types of tags referred to above, according to the instant invention, and regardless of the "assembly" format which is chosen, a phonetic tag includes anywhere from one to three conventional Chinese phonetic symbols. In the embodiment of the invention specifically described herein, each such symbol is assigned to a different key in a conventional keyboard, and entry of the symbol is effected by a single keystroke. Regarding a tone tag, of the five recognized Chinese tone symbols, only four of these are assigned each independently to a different one of four dedicated keys in the keyboard. The fifth tone symbol, is considered to be a "default" symbol, and is entered automatically (as will be explained) in the absence of a specific keystroke entry for a tone symbol. Thus, with respect to entering a tone tag, this is done through the performance of at most one keystroke. The two-digit number tags include different combinations of the digits 0-9, inclusive. Each digit is assigned to an independent key in the keyboard discussed herein, and each is entered through the use of a single keystroke. Thus, the totality of a number tag is entered, in all instances, through the performance of two keystrokes. How, in particular, a number tag is assigned to a character is explained below.
As a consequence of the code-tagging scheme outlined above, substantially all of the many thousands of characters in the Chinese language can be entered (via a keyboard) by an operator, with no more than six, and often as few as three, keystrokes. An exception exists for certain characters which are referred to as "collision" character. More specifically, a collision situation arises whenever a particular assembled code (including phonetic tag, tone tag, and number tag) is capable of identifying more than one language character. When this condition occurs, and with the use of conventional computer programming techniques, a system utilizing the method of the invention switches immediately into what might be referred to as a "menu" mode of operation. In this mode, the system is prepared to present an operator, (as, for example, via the screen in a cathode ray tube) with a "character menu" showing, in decending order of most common use, the several "collision" characters which "respond" to the same input code. Operators especially skilled with the system of the invention will quickly memorize, in these collision cases, the "menu" order of possible characters, and, immediately after entering the initial code, will enter another single digit which indicates the position of the desired character in the menu. In a case where an operator does not have a menu memorized, by applying a keystroke to the usual space key in a keyboard, the available menu is presented visually, and thereafter the operator, using a single keystroke, enters an appropriate single designating digit.
For the sake of completeness, another "special" case should be mentioned. This case is neither one which creates a possible situation of ambiguity, nor one which requires special programming or processing in a computer-based system utilizing the invention. The case referred to is one in which a given character can be referenced through a plurality of codes dffering in phonetic and/or tone elements. Here, ambiguity is avoided directly by the operator who will know, at the time of encoding the character, precisely which phonetic and tone symbols are required to give the selected character the desired meaning.
Various other features, objects and advantages which are attained by the invention will become more fully apparent as the description which now follows is read in conjunction with the accompanying drawings.