The invention relates generally to computer systems, and more particularly to the recognition of handwriting and speech.
Users who attempt to input information into desktop and hand-held computers via writing or speech can experience many recognition errors. This significantly slows the rate at which information is input, and significantly frustrates users. Improved recognition accuracy is continually sought.
The accurate recognition of cursive handwriting, for example, is a formidable task. A first difficulty arises in that cursive handwriting is initially represented as large quantities of coordinate pairs coming in via a digitizer over time, which must be processed in some manner. The higher the resolution of the digitizer, the more coordinate pairs are provided. To directly recognize handwriting from the coordinate data is beyond the capabilities of ordinary computers, and thus some pre-processing needs to be done on the data to make it more manageable.
One type of recognizer is based on a time-delayed neural network. In one such recognizer, described in the publication xe2x80x9cRecognizing Cursive Handwriting,xe2x80x9d David E. Rumelhart, Computational Learning and Cognition, Proceedings of the Third NEC Research Symposium, a neural network is trained to recognize a number of feature values representing known words. For example, two such values represent the net motions in the x and y directions, respectively, for that word. After training, when later attempting recognize a word, an unknown input word is featurized according to the criteria on which the neural network was trained, and the features therefor are fed into the neural network. The neural net outputs a probability for possible letters (a-z) in the word, and a dynamic programming procedure finds the best fitting words from a dictionary to produce a ranked ordering of words.
While the above recognition technique clearly works to an extent, tests on large numbers of samples have shown an approximately seventeen percent average error rate in recognition. This is inadequate for most user applications. Thus, while neural networked-based recognition is a promising recognition technique, improving the recognition accuracy is needed in order for practical applications to benefit therefrom.
Briefly, the present invention provides a system and method that improve the recognition accuracy of a time-delayed neural network-based handwriting or speech recognizer via an improved training method, improvements in pre-processing and an improved neural network model architecture. To recognize handwriting, in a first preprocessing step, a partitioning mechanism partitions a user""s handwritten electronic ink into lines of ink, or alternatively, into proposed words. A second step (via a mechanism for implementing same) smoothes and resamples the ink to reduce any variability resulting from different writing speeds and sizes, and also eliminates jagged edges. The resampling is based on the second derivative of the ink over a particular area, which accentuates the number of points at the curves and cusps of a character as opposed to the straight portions of a character. A third step examines the smoothed ink in time order to identify delayed strokes, i.e., strokes made with dotted xe2x80x9cixe2x80x9d or crossed xe2x80x9ctxe2x80x9d or xe2x80x9cxxe2x80x9d characters, which otherwise might potentially confuse the neural net. Delayed strokes are removed from the ink and recorded as feature information.
A segmenter provides a fourth step in which the recognizer process separates the ink into distinct segments based on the y-minima thereof. A featurizer implements a fifth step to featurize the segmented ink into a number of features, including Chebyschev coefficients, size and other stroke related information. A sixth step then runs the features for each segment, including the delayed stroke feature information, through a time delayed neural network.
The time-delayed neural network records the output in an x-y matrix, where the x-axis represents the strokes over time and the y-axis represents letter output scores assigned by the neural network for each letter. The improved architecture of the time-delayed neural network of the present invention outputs a separate score for whether a character is starting or continuing. In a seventh step, for every word or phrase generated from a trie structured dictionary and language model, a dynamic time warp (DTW) is run to find the most probable path through the output matrix for that word or phrase. Words or phrases are assigned a score based on the least costly path that can be traversed through the output matrix, and based on the assigned scores, the best words or phrases are returned from the recognizer. Note that as used herein the term phrase is intended to mean any plurality of words, whether they constitute a grammatically proper phrase, a complete sentence, or just any set of words not necessarily associated with one another.
A recognizer training method is also provided, the method using data labeled only at the word or phrase level. In general, the method enforces the correct number of letters and the correct order of the letters to be learned at the network train time. To this end, the neural network is started with initially random weights, and for each word or phrase input during training, the ink is featurized as described above and run through the neural network at that point. The label for the word is known, whereby a DTW matrix for that word is computed as at recognition time, recording the path backwards taken at each matrix cell. The cell in the upper-right corner of the matrix is then followed backwards to find the optimal path, setting a target of one for every network output that corresponds to the path, and a zero everywhere else.
Speech recognition based on phoneme information instead of stroke information may also employ the recognition steps, improved neural network architecture and the training method of the present invention to increase recognition accuracy.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which: