The present invention relates generally to the field of pattern and character recognition; and more particularly to an xe2x80x9cactivityxe2x80x9d-based system and method for feature extraction, representation and character recognition that reduces the required processing capacity for recognizing single stroke characters (or multiple strokes concatenated into one stroke) or patterns, with the intent that said characters or patterns may be created, removed, or edited from an alphabet by an individual for the purpose of personalization, without a method redesign. Further, the system and method of the present invention provide a parameter set such that its variance over an arbitrary alphabet can optimize recognition accuracy specific to that alphabet.
Methods for character, handwriting and pattern recognition for the purpose of alphanumeric or symbolic data (collectively referred to herein as xe2x80x9ctextxe2x80x9d) entry into computer systems has been a key research area for electrical engineers and computer scientists since the earliest days of computers. In fact, handwriting-based input systems were designed and attempted as early as about 1959, prior to the widespread use of alphanumeric keyboards. Even these systems are based on the symbol recognition technologies of about the early 1950s. Most early methods were xe2x80x9coff-linexe2x80x9d processing methods, which used both temporal and string contextual information to increase recognition accuracy. xe2x80x9cOn-linexe2x80x9d recognition uses only temporal drawing information to recognize while a user is writing. Generally, on-line methods sacrifice accuracy for real-time performance speeds. That sacrifice typically is not necessary for off-line recognition.
During the bulk of the 1960s, the keyboard was the premier form of text input as well as primary human interface to the computer. With the introduction of Douglas Engelbart""s xe2x80x9cmousexe2x80x9d and xe2x80x9cgraphical user interfacexe2x80x9d (GUI) in 1968, and the advent of digitizing tablets in the late 1960s, focus returned to research dealing with more natural human interfaces for manipulating digitized information. This remains the trend today with the various mainstream operating systems and desktop environments such as Apple""s Macintosh OS, X-Windows for the various Unix systems, and Microsoft""s Windows operating systems. In these systems, the mouse or some other pointing device such as a tablet or stylus are used to visually manipulate the organization of information on a screen (e.g., moving a window from the left side of the screen to the right, or to select a block of text). The text input mechanisms to all these systems, however, is still based primarily on the keyboard.
In the modern world, computing devices are getting smaller and more powerful (sometimes exceeding the power of five year old desktop personal computers) and are cheaper to produce. These small devices require text input devices that are not as cumbersome as keyboards. One potential alternative is handwriting recognition. Devices such as Apple""s Newton provided this technology, but with unacceptable performance. This is due to the complex issues of not only character recognition, but of trying to separate individual characters and symbols from handwritten words, sentences or complete documents prior to recognizing each character. Only recently has a viable solution to character separation been proposed.
In about 1993, the concept of writing characters one on top of the other in single strokes so that each character is automatically separated by xe2x80x9cpen eventsxe2x80x9d (such as pressing the pen to the writing surface to signify the start of a new character, dragging the pen along the writing surface to represent the structure of the character, and lifting the pen from the writing surface to signify the end of a character) was introduced. This reduces recognition tasks to the character level. Personal digital assistants (PDAs) like the Palm Pilot and iPaq have become mainstream and are incorporating this character recognition concept with great success.
The recognition accuracy of these devices is compromised, however, in the attempt to provide a specialized alphabet that is accessible to all users, along with a recognition method robust enough to handle the different writing styles of an arbitrary user. Palm""s Graffiti language, for example forces users to learn an alphabet that is potentially different from the day-to-day alphabet they are accustomed to. This adds user error to the recognition failure rates as they may continue to draw the letter xe2x80x98Qxe2x80x99 as they would on paper while trying to enter text into the Palm Pilot. This is an unnecessary constraint on the user, especially those who lack the motor control required to perform some of the Graffiti strokes. This would included sufferers of Parkinson""s disease, Multiple Sclerosis (MS) and Muscular Dystrophy (MD). Additionally, the Palm recognition method does not appear to be robust enough to distinguish letters like xe2x80x98Uxe2x80x99 and xe2x80x98Vxe2x80x99 naturally, and so a serif was added onto the tail of the xe2x80x98Vxe2x80x99 for greater separation. While this improves the distinction between such letters, it adds even greater difficulty to learning the new alphabet. In order to avoid these unnatural characters, one recognition system adds code that, when determining that the input character was either a xe2x80x98Pxe2x80x99 or xe2x80x98Dxe2x80x99, compares the height of the stem to the height of the attached curve in order to properly recognize. This does improve accuracy, but suggests that additional changes to the alphabet would require more character specific code to be written to handle new similarities, thus preventing the user from updating the character dictionary herself.
Some character recognition techniques such as structural matching and elastic relaxation employ complex feature manipulation methods for converting a xe2x80x9csloppyxe2x80x9d character to one that is stored in a character dictionary. These methods are difficult to comprehend and deploy by most vendors (in practice) and have high computational requirements. While the Merlin system was designed to be interpreted (Java) on weak devices such as portable phones, its incorporation of these methods detract from its speed.
Presently, most research in on-line character recognition has centered around single character entry systems. Characters are entered one at a time and the recognizer classifies the character before the next is written. This provides the user immediate feedback so that errors can be corrected as they occur. Typically, there is a simple method for the user to depict the beginning and end of each characterxe2x80x94commonly accomplished by pen down and up events.
Unistrokes, developed at Xerox Corporation in about 1993, is a well known example of a single character, pen-event system. Unistrokes characters were designed to be written one on top of another so as to minimize the real estate required for recognition and to allow for xe2x80x9ceyes free operationxe2x80x9d. The Unistrokes alphabet is based on five basic strokes and their rotational deformations. While several characters (xe2x80x98ixe2x80x99, xe2x80x98jxe2x80x99, xe2x80x98Lxe2x80x99, xe2x80x98oxe2x80x99, xe2x80x98sxe2x80x99, xe2x80x98vxe2x80x99 and xe2x80x98zxe2x80x99 for example) are represented by strokes similar to their Roman drawings, most characters"" strokes require memorization. Additionally, a model has been developed for predicting the time required to enter arbitrary text with Unistrokes by an expert user. This is particularly useful since several variations of the Unistrokes alphabet have been introduced over the past nine years.
Since about the mid 1990""s online character recognition has become widely employed in Personal Digital Assistants (PDA""s), beginning with the Palm OS device, which primarily defined the product category. A popular variation of Unistrokes is the Graffiti system used in the Palm OS family of PDA""s. Graffiti improved upon Unistrokes by representing characters with symbols that are, for the most part, quite like their Roman counterparts. A disadvantage of both Graffiti and Unistrokes is that their alphabets are static. As users change applications, more or fewer characters may be required. For example, there is little need for a simple, arithmetic calculator to recognize characters other than digits, some punctuation, and operators. Reducing the size of the alphabet in these situations might also increase recognition accuracy. Graffiti has several characters that are composed of multiple strokes in order to allow a more natural writing style. A number of factors, however have limited the use of character recognition to this category of device, and has even, for some PDA users, proven too frustrating. Some factors that have limited wider acceptance of character recognition include:
Lower real-world accuracy rates than advertised
Fairly significant requirements for memory and processor speed
Perceived complexity to develop
Dependence on a stylized alphabet that users are forced to learn
T-Cube, developed at Apple Computers in about 1994, is a self-disclosing method for character input. Nine pie-shaped menus are shown on a screen (or tablet), each menu containing eight characters or character commands. Characters are input by xe2x80x9cflickingxe2x80x9d a stylus from the center of a pie to one of its eight characters. This approach significantly decreases the amount of stylus-to-pad time required to draw an arbitrary character since each drawing is a unidirectional flick. T-Cube also uses a variety of earcons to aid users in their writing. There are two basic problems that prevent T-Cube from being an acceptable form of character input in mobile or wearable devices. First, because of the visual aspect of the pies, eyes-free operation is impossible. Second, circular shaped menus have been shown to be difficult to scan with the eye for many users, reducing the speed at which they can be correctly accessed.
Two other notable self-disclosing systems that incorporate circular forms are Quikwriting and Cirrin. These two systems are quite similar. Each maps the characters of the alphabet about the perimeter of a circular or rectangular form. Characters are drawn by sliding a stylus from the center of the form to a character. By sliding rather than flicking, users can write entire words with one long stroke, sliding from character to character. These two systems suffer the same problems as T-Cube.
In about 2000, the Minimal Device Independent Text Input Method (MDITIM) was developed. MDITIM represented drawings of characters with a chain of the four cardinal directions. This coarse grain resolution allows for a wide variety of input devices other than a stylus and pad (e.g., touchpads, mice, joysticks and keyboards). As with Quikwriting and Cirrin, MDITIM allows users to draw entire words with a single, long stroke. The disadvantage of MDITIM is that the drawings representing characters are not intuitive and require a bit of memorization.
Some of the most robust recognizers in development today are based on elastic, structural matching. While recognition accuracy for these algorithms is very high (averaging 97-98%), their recognition speed can be slow. For example, a known algorithm is capable of recognizing only up to about 2.8 characters per second on an Intel 486 50 MHz processor. Another algorithm is reported to perform at rates up to about 3.03 characters per second on an Intel StrongArm processor (approximately 133 MHz). Other algorithms have an average speed of 7.5 characters per second running on a Sun SPARC 10 Unix workstation.
Thus, it can be seen that needs exist for improved systems and methods for character recognition. It is to the provision of improved systems and methods for character recognition meeting these and other needs that the present invention is primarily directed.
Example embodiments of the present invention provide an algorithm that, by means of an improved feature extraction technique, significantly reduces the computational overhead required to support robust, online character recognition, and permits the use of arbitrary alphabets. The algorithm can be made adaptive, so that it transparently modifies the parameters of the recognition algorithm to increase accuracy with a particular alphabet as used by a single user, over time. The system and method of the present invention is adaptable to a variety of applications and many types of devices. First, devices with very little computational capability can now incorporate character recognition, for example, a 20 MHz, 8-bit microcontroller using 40 K bytes of memory. Thus, toys, pagers, mobile phones, and many other small, inexpensive devices can take advantage of character recognition for command and data entry. Second, the alphabet independence of the algorithm makes it attractive for use by those who require application specific alphabets. Any set of marks can be assigned arbitrary meanings since the algorithm does not require the use of particular features of the Roman alphabet or any other. The algorithm can be made adaptive, so that the idiosyncrasies of the writing of any particular user can be incorporated and thus increase the accuracy of the recognition. Finally, this algorithm, in practice, appears to exhibit an immunity to noise that makes it forgiving of the writing style of someone writing in a noisy environment (such as on a subway, for example), or suffering from a tremor, nervous or motor condition.
Preferred forms of the invention provide a system and method for on-line character recognition that is fast, portable, and consumes very little memory for code or data. The algorithm is alphabet-independent, and does not require training beyond entering the alphabet once. The algorithm uses an xe2x80x9cactivityxe2x80x9d value in performing feature extraction, to achieve a high rate of accuracy. The recognition is improved dynamically without further input from the user, and brings character recognition capability to classes of devices that heretofore have not possessed that capability due to limited computing resources, including toys, two-way pagers, and other small devices. An example embodiment of the invention achieves a recognition rate of 16.8 characters per second on a 20 MHz, 8-bit microcontroller without floating-point. The alphabet-independent nature of the algorithm, as well as the ease with which recognition may be optimized dynamically, makes it particularly well suited for enhancing the capability of persons with impaired motor skills to communicate by writing.
In one aspect, the invention is a method for character recognition, the method preferably comprising receiving input data representing an input character; extracting at least one feature from the said input data, the at least one feature including an activity metric; comparing the feature(s) extracted from the input data to an alphabet comprising a plurality of output characters; and selecting an output character based on the comparison of feature(s).
In another aspect, the invention is a method of recognizing an input character representation, the method preferably comprising collecting data corresponding to at least a portion of a character stroke; mapping the collected data to at least one directional code; and approximating the number of directional codes occurring in the character stroke portion.
In yet another aspect, the invention is computer executable software for implementing either of the above-described methods; computer readable media comprising said software; and/or a computer programmed to execute that software.
In yet another aspect, the invention is a system for recognizing an input character representation. The system preferably includes an input device for receiving and collecting data corresponding to at least a portion of an input character stroke; and a processor for mapping the collected data to at least one directional code, and approximating the number of directional codes occurring in the character stroke portion. In a further preferred embodiment, the system optionally further comprises memory for storing an alphabet of characters for comparison to collected data corresponding to at least a portion of an input character stroke.
These and other aspects, features and advantages of the invention will be understood with reference to the drawing figures and detailed description herein, and will be realized by means of the various elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following brief description of the drawings and detailed description of the invention are exemplary and explanatory of preferred embodiments of the invention, and are not restrictive of the invention, as claimed.