From the user's point of view, pen based entry systems that use handwriting recognition technology provide a natural computer interface for form filling applications. Unfortunately, no handwriting recognition system is perfect and it is the common experience that correction of errors is the most time consuming and difficult part of using such systems. Conventional handwriting recognition systems provide facilities for error corrections such as rewriting a character, using an alternate word list or using a soft keypad. Each of these error correction techniques is time consuming and can be frustrating to the user.
Another important problem in designing pen based applications is identifying correct interfaces for pen and handwriting recognition based systems. Entry of numerals and ordinary words from a small finite dictionary may be better performed with a virtual keyboard or menu selection process. Handwriting recognition provides an ideal interface in situations where a word must be identified from a set too numerous to list in menu format.
Given the difficulties associated with pen based recognition systems, a system which predicts a whole word from a written substring and presents the user with a candidate list menu selection, known as autocompletion, is a desirable interface. In particular, a system that could predict a whole string from an ambiguous substring which may contain misrecognition errors would be particularly useful. This would give the application a look and feel of near 100% accuracy, despite the handwriting recognition technology accuracy being less than 100%.
The present invention relates to the field of applications that use handwriting recognition technology as a means of data entry for form filling. It specifically applies to applications which may contain one or more handwriting input fields in which discrete character recognition is used. Each input field typically has a set of well defined choices in a dictionary from which the user may make a selection.
It is known to use dictionary matching at various levels in the application or by displaying alternates for the purpose of error correction. Other techniques for improving usability of handwriting recognition based applications include incorporating some form of constraints in the input fields, rule based discrimination and the use of spell checkers to improve the recognition accuracy.
Known systems can be divided into two broad categories. The first includes the use of dictionaries to improve recognition accuracy and the use of dictionaries to perform postprocessing after recognition occurs.
The second category of known systems is the use of alternates for a given recognized word. The concept of selecting from a list of alternates for error correction fundamentally applies to recognition technology that works at a word level, such as cursive handwriting recognition or discrete speech recognition. Discrete handwriting recognition technology works at the character level, and applying the concept of alternate lists for error correction would imply having to correct each character which is in error, one at a time, in order to complete data entry in a given field in the form filling application. This is not an efficient technique both in terms of speed of use and usability of the application.
Another approach using the alternate list returned from the recognizer is to conduct an incremental search in the dictionary associated with a specific field for each recognized character and its alternates. This is not a practical real time solution for two reasons: 1) there is no guarantee that the recognizer returns all possible confusions for a specific character when the alternates are queried; and 2) the search space in the dictionary is of the order of m to the power of n where m is the number of alternates used in the search and n is the position number of the character returned by the recognizer. This search cannot be performed in real time with present technologies.
Other systems perform an incremental search by adding the successively recognized character or word to the search process. Still other known systems refer to certain external techniques to improve the appearance of recognition in an application.
Thus, it can be seen that it would be desirable to design a method by which the intended word for a specific field appears in a list as short as possible, with high accuracy and real time performance, despite the accuracy of the recognition technology being less than 100%. This gives the user a speedy data entry interface with near 100% accuracy.
The foregoing objectives can be met in accordance with the invention through the use of a confusion matrix mediated prediction system. The confusion matrix for a specific character is a list of probabilities of that character being confused with every other character in the character set. A confusion matrix for the entire English character set of 83 characters including alphabets, numerals and punctuations is such a list of probabilities for each character in the character set.