Typical computer systems, especially computer systems using graphical user interface (GUI) systems such as Microsoft WINDOWS, are optimized for accepting user input from one or more discrete input devices such as a keyboard for entering text, and a pointing device such as a mouse with one or more buttons for driving the user interface. The ubiquitous keyboard and mouse interface provides for fast creation and modification of documents, spreadsheets, database fields, drawings, photos and the like. However, there is a significant gap in the flexibility provided by the keyboard and mouse interface as compared with the non-computer (i.e., standard) pen and paper. With the standard pen and paper, a user edits a document, writes notes in a margin, and draws pictures and other shapes and the like. In some instances, a user may prefer to use a pen to mark-up a document rather than review the document on-screen because of the ability to freely make notes outside of the confines of the keyboard and mouse interface.
Some computer systems permit a user to write on a screen using, for example, a stylus. For example, the Microsoft READER application permits one to add electronic ink (also referred to herein as “ink”) to a document much the same way that a user would write with a standard pen and paper. Hand-held computing devices, commonly known as Personal Digital Assistants (PDAs), as well as the future release of the Tablet PC also permit the user to write on the screen.
A handwriting recognition system may then be used to analyze the electronic ink to recognize characters, for example, Unicode characters. As the user moves the stylus across the screen, the computing device senses the position of the stylus as the user writes and stores the position data. The computing device analyzes the position data and converts it to recognized characters, such as letters or numbers, in a convenient format, such as Unicode format. There are many handwriting recognition systems in use including, for example, memory-based handwriting recognition systems.
Handwriting recognition systems use algorithms to map handwritten data to characters. For example, handwriting recognition systems may utilize neural networks, Hidden Markov Models, and/or prototypes. In the example, of prototypes, the system internally stores prototypes for each character that can be recognized. A prototype is a “picture” of a handwritten character that is used to map handwriting to a character. Alternatively, a recognition system may use statistical models for each character. Recognition systems use recognition algorithms to measure the distance from handwritten data to one or more prototypes or statistical models. As long as the user writes like a prototype or a statistical model, the handwritten data is successfully recognized. Conversely, the more dissimilar the handwritten data and the prototype or the statistical models are, the more likely it is that the handwritten data will be misrecognized. Misrecognition is typically due to the differences in user handwriting styles and legibility of the handwriting. For example, the handwritten word “dear” may be misrecognized as the word “clear” depending on the way the user writes a “d” and the prototypes or statistical models for the character “d,” “c,” and “1.”
One way to minimize the risk of misrecognition is to have a good model database, which provides the various possible shapes the recognizer should understand for any given character. A good model database, however, may have many statistical components, which requires greater memory. This can be quite problematic, for example, in the case where the recognizer is for an East Asian language, which typically has many complex, multi-stroke characters. A large model database can be particularly undesirable in the context where the recognizer is part of a smaller computing device, such as a handheld Personal Digital Assistant (PDA). Handheld computing devices typically are limited in memory and a large model database may not be commercially acceptable.
One option to overcome the above problems is to reduce the size of the model database. The difficulty arises, however, in being able to reduce the model database without significantly compromising the ability of the handwriting recognizer to accurately recognize characters. It is therefore desirable to adequately reduce the size of a model database with minimal effects on character recognition accuracy.