1. Reference to Related Applications
Filed concurrently with this application are applications entitled "Complex Pattern Recognition Method and System", Ser. No. 459,282, and "Method for Distinguishing Between Complex Character Sets", Ser. No. 459,283, now U.S. Pat. No. 4,531,231.
2. Field of Invention
This invention relates to pattern recognition, for example, to recognition of handwritten characters such as Chinese characters (i.e., Kanji). Specifically, this application relates to identification of complex characters composed of elements, namely strokes, wherein strokes of distinguishable significance are subject to confusion.
The recognition of complex characters has been pursued with limited success for many years. Kanji has been considered the greatest challenge because it is not easily adapted to keyboard input. There are for example approximately 10,000 distinguishable characters in use in the Kanji system representing various words, phrases, concepts and in some instances syllables.
Various recognition schemes have been reported for hand registered characters. The schemes are typically based on spatial and certain limited shape characteristics of elements such as strokes, a stroke being the locus and sequence of a continuous chain of related points created by substantially uninterrupted contact between a pattern-forming means and a pattern-accommodating means, such as a pen and a tablet or any other movement-registering instrument or system. Prior art schemes are aimed at collecting and retaining a substantial amount of information which is processed in an attempt to distinguish the character from all other characters.
Recognition of complex characters of the type of interest is made more difficult because there are no uniform definitions for the fundamental stroke types from which the characters are formed, and there is substantial variation in character formation, even by the same writer. Consequently there is a potential for confusion between differing strokes and between different characters. What is needed is a pattern recognition scheme which is capable of tolerating wide variations while accurately identifying patterns and specifically characters from groups of basic elements such as strokes.
3. Description of the Prior Art
Prior stroke recognition systems relevant to the present invention are represented by the following references:
"On-Line Recognition of Handwritten Characters", Hiroki Arakawa et al., Review of the Electrical Communication Laboratories, Vol. 26, Nos. 11-12, Nov.-Dec. 1978 describes a system in which a pair of linear waveforms is derived by recording in rectangular coordinates the movement of a handwritten point, approximating linear waveforms through a rectangular function expansion and then recognizing a character by utilizing a set of coefficients of the rectangular function.
IEEE Transactions on Electronic Computers, Dec. 1967, pp. 856860; Japanese Patent Application No. 1977-083733 entitled "On Line Recognition Method of Handwritten Characters" filed July 12, 1977; and U.S. Pat. No. 4,173,753 to Chou entitled "Input System for Sino-Computer" represent another general type of stroke recognition technique, namely, pattern matching. In Chou, strokes are recognized as elementary patterns in strings of elementary strokes. In the '733 reference, a spatial matching technique is described. Strokes of a character to be recognized are approximated by coordinate position, and deviations from standard coordinate patterns are computed point by point and summed over the whole character to obtain decision criteria. Analysis of these types of schemes supports a conclusion that increasing the amount of information about a stroke does not necessarily lead to improved recognition accuracy. In fact, increasing the precision of stroke registration increases the difficulty of pattern matching. On the other hand, decreasing the precision of stroke registration causes confusion among strokes of similar shape but differing significance. In either extreme, stroke recognition accuracy degrades.
"On-Line Recognition of Hand-Written Characters Utilizing Positional and Stroke Vector Sequences", Pattern Recognition, Vol. 13, No. 3, p. 191 (Permagon Press, 1981) is a reference which reports of an extended six company/university joint effort to develop a stroke vector sequence character recognition system based on elemental stroke shapes derived from five percent to ten percent of a stroke length. A great deal of data is developed about a relatively small portion of a stroke. The proposed system is believed to be expensive and insufficiently accurate to be a practical and commercial success.
E. F. Yhap et al., "An On Line Chinese Character Recognition System", IBM Journal of Research and Development, Vol. 25, No. 3, p. 187 (May 1981) describes a handwritten Chinese character recognition scheme in which a large number of parameters about a Chinese character are categorized, generally in relation to positions within a field of registration.
Crane et al., "A Technique for the Input of Handprinted Chinese Characters Based on Sequential Stroke Recognition", Proceedings of International Computer Symposium 1977, Vol. One, p. 246 (27-29 Dec. 1977, Taipei, Republic of China) is a survey article. It further describes a proposed character recognition technique suitable for essentially real time processing. It is based on stroke label sequence recognition where there is limited-precision stroke recognition. The paper is an early publication related to the present work and describes preliminary conclusions of the present inventors. The present invention may be used in connection with techniques described in this and other prior works.
Crane et al., U.S. Pat. No. 4,040,010 issued Aug. 2, 1977, describes a handwriting verification system in which a special pen produces signals representative of various parameters based on angularly resolved writing pressure for identifying a signature or other appropriate group of relatively simple characters or symbols. A signature is deemed to be a forgery according to the invention if the sum of the component variations of the detected signature deviates beyond some preselected threshold value established by a signature template.
Various other methods not to be confused with the present invention relate to the pictorial aspects, that is, spatial appearance, of a character. These and other schemes have been explored for many years, and substantial research efforts have been largely unsuccessful in providing a practical character recognition system which can be used in an interactive, essentially real-time environment.