Typical computer systems, especially computer systems using graphical user interface (GUI) systems such as Microsoft WINDOWS, are optimized for accepting user input from one or more discrete input devices such as a keyboard for entering text, and a pointing device such as a mouse with one or more buttons for driving the user interface. The ubiquitous keyboard and mouse interface provides for fast creation and modification of documents, spreadsheets, database fields, drawings, photos and the like. However, there is a significant gap in the flexibility provided by the keyboard and mouse interface as compared with the non-computer (i.e., standard) pen and paper. With the standard pen and paper, a user edits a document, writes notes in a margin, and draws pictures and other shapes and the like. In some instances, a user may prefer to use a pen to mark-up a document rather than review the document on-screen because of the ability to freely make notes outside of the confines of the keyboard and mouse interface.
Some computer systems permit a user to write on a screen using, for example, a stylus. For example, the Microsoft READER application permits one to add electronic ink (also referred to herein as “ink”) to a document much the same way that a user would write with a standard pen and paper. Hand-held computing devices, commonly known as Personal Digital Assistants (PDAs), as well as the future release of the Tablet PC also permit the user to write on the screen.
A handwriting recognition system may then be used to analyze the electronic ink to recognize characters, for example, Unicode characters. As the user moves the stylus across the screen, the computing device senses the position of the stylus as the user writes and stores the position data. The computing device analyzes the position data and converts it to recognized characters, such as letters or numbers, in a convenient format, such as Unicode format. There are many handwriting recognition systems in use including, for example, prototype-based handwriting recognition systems.
Handwriting recognition systems use algorithms to map handwritten data to characters. For example, handwriting recognition systems may utilize neural networks, Hidden Markov Models, and/or prototypes. In the example, of prototypes, the system internally stores prototypes for each character that can be recognized. A prototype is a “picture” of a handwritten character that is used to map handwriting to a character. Recognition systems use recognition algorithms to measure the distance from handwritten data to one or more prototypes. As long as the user writes like the prototypes, the handwritten data is successfully recognized. Conversely, the more dissimilar the handwritten data and the prototype are, the more likely it is that the handwritten data will be misrecognized. Misrecognition is typically due to the differences in user handwriting styles and legibility of the handwriting. For example, the handwritten word “dear” may be misrecognized as the word “clear” depending on the way the user writes a “d” and the prototypes for the character “d,” “c,” and “l.”
One way to minimize the risk of misrecognition is to have a good prototype database, which provides the various possible shapes the recognizer should understand for any given character. A good prototype database, however, may require multiple prototypes for each character to be recognized. Generally, the greater the number of prototypes in the prototype database, the more accurate the recognizer. This can be quite problematic, for example, in the case where the recognizer is for an East Asian language. East Asian languages typically have thousands of characters. To compound the problem, East Asian language characters are also inherently complex and typically require multiple strokes of ink to form each character. The prototype database for an East Asian language may therefore have hundreds of prototypes for each character. The original prototype database for an East Asian language may have millions of prototypes and may require tens of millions of bytes of memory.
A large prototype database can be undesirable for many reasons. For example, a large prototype database requires long search times. The required time for the recognizer to recognize East Asian language characters may therefore be unacceptably long. As another example, in the context where the recognizer is part of a smaller computing device, such as a handheld Personal Digital Assistant (PDA), the required memory for the prototype database may be unacceptably large.
One option to overcome the above problems is to limit the size of the prototype database. The difficulty arises, however, in being able to limit the prototype database without significantly compromising the ability of the handwriting recognizer to accurately recognize characters. Training algorithms may be used to select a good subset of the possible prototypes, however, these training algorithms are unable to limit the prototype database without adversely affecting the recognizer's accuracy. Moreover, to maintain desired character recognition accuracy, the training algorithms are unable to sufficiently limit the size of the prototype database.
It is therefore desirable to adequately reduce the size of a prototype database with minimal effects on character recognition accuracy.