Symbols, such as handwriting, when traced on an electronic tablet are represented by sequences of x-y coordinate pairs. A fundamental unit of handwriting is the stroke. A stroke is considered as a sequence of points, represented by their respective x-y coordinates. As employed herein a stroke is considered to be the writing that occurs from a pen-down to a pen-up condition of a handwriting input device. Characters and symbols are assemblages of such strokes.
Many real-time handwriting recognition systems employ curve matching techniques in order to match an unknown input symbol against the members of a set of symbol prototypes or templates As such, the overall accuracy of the handwriting recognizer is a function of the quality of the prototype set while the speed of recognition is a function of the number of members of the prototype set that must be examined. It is therefore desirable to provide a prototype establishment procedure for use in a real-time recognition system that optimizes both handwriting recognition accuracy and speed. To achieve this goal the set of prototypes should exhibit the following characteristics.
Firstly, the set of prototypes should exhibit sufficient coverage. That is, the set of prototypes should contain a member that corresponds to each distinct manner of writing a given character or symbol. In this regard it is also desirable that the recognition process be capable of operating with variations between symbol expression. For on-line, or real-time, handwriting recognition the prototype set should ideally encompass variations in the number, order, direction and the shape of the constituent stroke or strokes that make up a given symbol.
Secondly, each member of the set of prototypes should embody a "good" representation of an acceptable manner of writing a corresponding symbol or character. That is, the set of prototypes should ideally be free of prototypes that result from aberrant, or "maverick", symbol expressions. As used herein a maverick is considered to be a piece of writing that is different than that intended by the writer.
Thirdly, the individual members of the set of prototypes should exhibit a sufficient degree of separation or distance from one another in a prototype "space" so as to reduce the processing burden of the recognizer in selecting the prototype from the set that most nearly matches the constituent stroke or strokes of the input symbol.
One prior art handwriting recognition system that uses the curve matching method mentioned previously is described in an article entitled "Word Processing with On-line Script Recognition" by W. Doster et al., IEEE Micro., vol. 4, pp. 36-43, 10/84. This article describes a segmentation process for handwritten characters. An input character is said to be compared only to reference characters having an identical number of connected-line segments (CLS). A CLS is said to be a string of coordinates generated while a stylus is in contact with a tablet. The authors state that an experimental program for on-line script recognition includes components for interpretation and presentation of intermediate results of various processing steps, and components for editing a reference symbol set.
Another prior art handwriting recognition system that uses the curve matching method is described in an article entitled "On-line recognition of hand-written characters utilizing positional and stroke vector sequences" by K. Ikeda et al., Proc. 4th Int. Jt. Conf. Pattern Recognition, pp. 813-815, 11/78. The authors describe the use of a spatial filter to obtain sampling data independent of the speed of pen movement and that rejects noisy data. Recognition of a stroke shape is accomplished by matching a stroke vector sequence against a shape dictionary. The authors employ a concept of similarity of stroke shape when matching input strokes to shapes in the dictionary and state that the number of strokes is a parameter for primary selection.
It is an object of the invention to provide a method to establish a set of character prototypes that provides sufficient coverage, adequate representation and sufficient separation one from the other to support the on-line, real-time operation of a character recognizer.
It is a further object of the invention to provide a method to interactively establish a set of character prototypes each of which is comprised of an average of similarly formed characters obtained from a training session, and to also establish a set of stroke prototypes each of which is comprised of an average of similarly formed strokes obtained from the set of averaged character prototypes.