The present invention relates to Chinese and Kanji character recognition and, more particularly, to an on-line tablet recognition system with an extendable vocabulary of recognizable Chinese/Kanji words.
The preparation of text (i.e., correspondence, business documents, etc.) usually involves the composition of a sequence of thoughts by one person and the transcription of handwritten or spoken expressions of these thoughts into a printed version by another person. Several iterations are frequently required before the final copy is satisfactory. Each iteration requires substantial intervals of active time and waiting time for both individuals. During these waiting intervals, each person usually turns to other matters. Some time must then be spent in reorientation when the most recent draft is received from the other person. This procedure is both inefficient and subject to errors.
There is presently no commercially available convenient method for entering a large vocabulary of Chinese/Kanji words into text processing and data processing equipment.
In the process of recognizing handwritten Chinese characters, there is a variability of writing style for a given individual as well as differences in writing style from individual to individual. In one known approach for on-line character recognition using a tablet or a light pen input device, character recognition is achieved by two types of information, these being the stroke distribution count and the indentification information for the first and last strokes drawn. Such approach was developed by J. Liu and is described in "Real Time Chinese Handwriting Recognition Machine", M.I.T. Cambridge, E.E. Thesis (1966).
In Liu's scheme, the principal means of discriminating among characters is the stroke distribution count, the number of strokes of each type occurring in the character. Liu distinguishes 19 types of strokes. With the stroke distribution count as the only discriminating criterion for character recognition, there are cases where characters have exactly the same distribution count within a group of at most 5 or 6 characters. Liu discovered that complete discrimination could be achieved by adding as additional information the identities of the first and last stroke drawn for the character.
Each character, then, is assigned a unique identififer consisting of its stroke distribution count and the identity of its first and last strokes. For recognition, successive strokes of a given character are drawn and recognized one by one, and the result compared to the identifiers of all known characters. An exact match is required for recognition.
Liu defined each of his strokes as a sequence of local extrema (relative maximum or relative minimum) along two orthogonal axes which are mathematically defined as the points along a curve where the derivative (either dY/dX or dX/dY) goes to zero. Each extrema is further characterized as smooth or pointed. Finally, certain extrema are "don't care" events: they may occur or not in the sequence.
To achieve stroke recognition, the extrema of a stroke are determined as it is drawn. The resulting sequence is compared with the defined sequences of all 19 strokes, ignoring the presence or absence of don't care extrema. An exact match is required for recognition.
Liu's technique appears to be disadvantageous since it requires an exact match both for stroke recognition and character recognition. Also, the ability to add symbols and extend the vocabulary of recognizable words is difficult.
Thus, the known Chinese and Kanji character recognition schemes generally involve individual strokes which are distinctly used, strokes are counted and classified in accordance with their sequence of occurrence. In one or more systems a tree logic is employed which classifies an incoming stroke into one of a predetermined number of basic strokes, and then analyzes the combination of strokes to determine a predetermined symbol. The successful recognition of the symbol requires (1) proper construction of each stroke of the symbol to satisfy its stroke classification criterion, (2) proper ordering of the strokes to obtain the needed additional discrimination, (3) the presence of that symbol in the set of symbols for which recognition criteria have been defined.
When methods like these are implemented for a specific set of common Chinese symbols, a unique recognition criterion is defined for each symbol. Practical Chinese and Kanji vocabularies require more than two thousand symbols. When the names of people are added, the symbol set exceeds several thousand. If a new symbol is to be added to the existing set, it is necessary to review the previously defined recognition criteria for all the symbols in the existing set to ascertain that the new symbol will not inadvertently satisfy a criterion that has been defined for a prior symbol, and that the proposed criterion for the new symbol cannot be satisfied by one of the original symbols. As the symbol set is enlarged to the size of a practical vocabulary, this analysis becomes very difficult. Many of the detailed criteria for stroke classification and allowable stroke order might require adjustment when a new symbol is added, and as the number of symbol types is increased the acceptable variations for each symbol must be narrowed.
It is an object of the present invention to provide a convenient method for entering a large vocabulary of Chinese/Kanji words into text processing and data processing equipment. It is another object to provide an online tablet recognition system with an extendable vocabulary of recognizable words.
It is another object to provide a method for the direct composition of printed text material, in a "natural manner", by the author without the intervention of another person. The "natural method" that is used is the online recognition of script. Script recognition is favored over other means, such as voice or block letter recognition, because script is by far the most common method used in business for the original composition of text material.