The reference to any prior art in this specification is not, and should not be taken as, an acknowledgement or any form of suggestion that the prior art forms part of the common general knowledge.
Prior art processing devices, such as handheld computers, are available which decode user instructions based on handwritten data inputs. Other devices force the user to enter data using a pseudo-handwritten format. One example of such a device is produced by Palm Computers and use a proprietary input format known as Graffiti. This input format allows the user of a handheld computer to enter data into the device by moving a plastic stylus in predefined motions over a touchscreen area, where each character has an associated ‘stroke’, which in many cases resembles the actual character.
Such systems offer advantages where the portable device is too small to have a usefully sized keyboard, but they require the user to learn an artificial ‘language’ in order to enter data.
The range of handwriting styles possessed by individual users is vast, and therefore, the provision of automated computer recognition of different user's handwriting is problematic. This has resulted in minimal use of commercially viable handwriting recognition systems in computing devices. It is desirable to provide a system which is able to interpret handwriting without forcing the user to adapt his or her writing style to conform with the expected input style of a particular device.
Most pattern or character recognition systems perform some kind of segmentation of an input signal to identify the fundamental primitives of the data and to minimize the level of noise in the input. Segmentation is also performed to reduce the amount of information used during feature extraction, and allows pattern recognition to occur on more abstract features of the input signal.
In handwriting recognition systems, individual strokes are often segmented into a number of sub-stroke primitives during preprocessing. These primitives are then passed to a feature extraction module or used directly for pattern classification. In cursive or connected print recognition systems, a single stroke may represent more than one letter, so segmentation is used to identify potential letter segmentation points.
Many segmentation techniques have been described by the research community, including simple approaches based on the properties of the human motor system Examples include segmenting at curvature maxima, critical points, and velocity extrema, or a combination of these techniques (e.g. looking for points where curvature extrema and velocity minima coincide). Other research has proposed using ballistic gesture detection, independent component analysis of the strokes, and regularity and singularity concepts for the segmentation of strokes.
While the above procedures use handwriting generation as the foundation for segmentation, other techniques are based on the perception process. Key to the visual decoding of letters is the perception of the local relative positions of primitives thus positional extrema play an important role in the recognition of letter shapes. Perception based criteria for segmentation include X and Y extrema, cusps, and stroke intersections.
While stroke segmentation can improve the accuracy of a handwriting recognition system under certain conditions, it can also be a major source of recognition errors. Most motor-based stroke segmentation algorithms apply some kind of numeric threshold when selecting segmentation points, resulting in the possibly inconsistent splitting of strokes that are poorly formed. FIG. 2b) shows the incorrect segmentation based on curvature extrema of a badly written letter ‘a’.
In this example, stroke segmentation is being used to partition or segment the circular body of the ‘a’ from the more linear stem using the extreme curvature at the top of the stem as a segmentation point. FIG. 2a) shows the correct segmentation point (as indicated by the cross). However, FIG. 2b) has a curvature extremum inside the circular body, and is missing the expected cusp that marks the beginning of the letter stem, resulting in the incorrect segmentation of the stroke.
Velocity is also used for segmentation, since handwriting is generated by a series of ballistic movements (i.e. accelerating from the start to a peak velocity, then decelerating at the target point). Sections of high velocity are generally straight, while low velocity usually occurs at the extrema of curvature. However, velocity is also subject to thresholding problems, and additionally, the user may pause while writing a stroke, leading to an invalid segmentation point. In FIG. 3, the sampled points, indicated by squares, for a letter ‘a’ are shown. The velocity of the pen can be derived from the spacing between the samples (assuming a constant sampling rate), so a large spacing indicates high velocity, while samples that are close together indicate lower velocity. In the example, the low velocity (and high curvature) regions can be seen as clusters of samples in the cusp at the top of the stem, as well as in the small hook at the bottom of the stem. However, there is another region of low velocity (on the left of the circular region) that was caused by the writer hesitating during the down-stroke. So while the letter ‘a’ is clearly well written, the velocity-based segmentation may produce an inconsistent result.
Perceptual segmentation techniques, such as using Y-extrema as segmentation points, generally do not suffer from thresholding problems, since no numeric value is required to determine if a point is a local extremum. However, these techniques also suffer from inconsistent segmentation. In FIG. 4a), the letter ‘a’ is segmented, indicated by a cross, at a Y-extremum located near the start of the stroke. However, the second letter ‘a’, shown in FIG. 4b) while clearly the same letter as the first, does not contain a Y-extremum at this position, as the stroke tends to level off.
Most other segmentation algorithms suffer from these problems, and are particularly affected by poorly written letters. Due to the difficult and error-prone nature of stroke segmentation, many systems do not attempt any kind of stroke segmentation and simply work directly on the raw, un-segmented strokes provided by the user. Those systems that do perform stroke segmentation usually implement some kind of elastic matching procedure to minimize the effect of inconsistent segmentation.
“Elastic Structural Matching For Recognizing Online Handwritten Alphanumeric Characters,” Technical Report HKUST-CS98-07, Department Of Computer Science, Hong Kong University, March 1998 discloses the use of extrema of curvature to segment strokes into multiple line segments. However, they note that “a smooth stroke may be broken into parts due to poor quality writing” thus causing incorrect segmentation to occur. To counter this, they implement a set of rules that attempt to detect invalid segmentation, combining incorrectly segmented sub-strokes to form a new stroke.
“Handwritten Word Recognition—The Approach Proved By Practice”, Advances In Handwriting Recognition, Series in Machine Perception and Artificial Intelligence, Vol. 34, pp. 153-162, World Scientific Publishing Co. 1999 discloses the use of zero crossing points in vertical velocity to segment handwritten cursive strokes in a commercial optical check-reading system. The sub-strokes are then matched against a set of primitive elements to be used in an elastic-matching recognizer.
“Global Methods for Stroke-segmentation”, Advances In Handwriting Recognition, Series in Machine Perception and Artificial Intelligence, Vol. 34, pp. 225-234, World Scientific Publishing Co. 1999 discloses stroke segmentation of offline images based on contour curve fitting. In their method, curves are first approximated using cubic B-splines, with segmentation cuts made at the extreme points of curvature.
“A Fuzzy Online Handwriting Recognition System: FOHRES,” Proceedings of the 2nd International Conference on Fuzzy Theory and Technology, 13-16 Oct., 1993, Durham, N.C. teaches the use of fuzzy-logic representations of pen velocity and direction together with a group of linguistic variables to form a set of fuzzy-logic rules for stroke segmentation. Their segmented strokes are used as the primitives for fuzzy feature extraction.
“Recognizing Letters in Online Handwriting Using Hierarchical Fuzzy Inference”, 4th International Conference Document Analysis and Recognition (ICDAR), Aug. 18-20, 1997, Ulm, Germany discloses the segmentation of strokes at cusps and points with horizontal tangents into sets of PStrokes (partial strokes). The disclosed algorithm uses a system of angular smoothing (rather than the point-position smoothing that is often performed) that does not distort discontinuous parts of the pen trajectory (i.e. cusps).
“Detection Of Extreme Points of Online Handwritten Scripts”, Progress In Handwriting Recognition, pp. 169-176, 2-5 Sep., 1996, Colchester, UK. World Scientific Publishing Co. discloses a robust local extrema of curvature detection algorithm that is based on the delta log-normal theory of handwriting recognition, which is disclosed in “A Delta Lognormal Model for Handwriting Generation,” Proceedings of the 7th Biennial Conference of the International Graphonomics Society, 126-127, 1995. To segment strokes into primitive components, they disclose the use of calculations of angular signal intensity and first order crossing points.
“Perceptual Model of Handwriting Drawing Application to the Handwriting Segmentation Problem”, 4th International Conference Document Analysis and Recognition (ICDAR), Aug. 18-20, 1997, Ulm, Germany discloses a modeling and segmentation approach based on the detection of a set of “perceptual anchor points”. Basically, they search for ‘catastrophe’ points, which are defined as points of discontinuity such as pen-ups, sharp turns, and cusps, and ‘perceptual’ points, which include points of inflection, X- and Y-extrema, and stroke intersection points.
U.S. Pat. No. 6,275,611 describes a character recognition system that segments strokes at points where “local angle change is a maxima and exceeds a set threshold”. See also “Handwriting Recognition Device, Method and Alphabet, With Strokes Grouped Into Stroke Sub-Structures”, Aug. 14, 2001. A full description of the segmentation algorithm is given in U.S. Pat. No. 5,740,273. Similarly, U.S. Pat. No. 5,889,889 discloses the performance of stroke segmentation in a handwritten character recognizer by detecting points that “are identified by such criteria as abruptness of direction changes, as well as by pen lifts”. This discloses the same segmentation procedure in a system designed to represent handwritten input in a parametric form for compression and reconstruction, segmenting strokes at “corners and cusps, where direction changes abruptly”. See also U.S. Pat. No. 6,044,174.
The process described in U.S. Pat. No. 6,137,908 identifies Y-extrema as part of the preprocessing of strokes for recognition. Intermediate points between these extremum are also extracted and stored as a “frame” for use in the recognition system.
Similarly, U.S. Pat. No. 5,610,996 discloses the use of a series of arcs as primitives for recognition, where “the arcs begin and end at Y-extrema points on the sample text.” This document also discloses the use of alternative segmentation schemes, such as X-extrema, and combined X-Y extrema.
U.S. Pat. No. 4,024,500 discloses the use of X- and Y-extrema to segment cursive strokes in to characters (rather than ballistic sub-stroke primitives).
U.S. Pat. No. 5,854,855 teaches the use of a velocity profile to segment strokes, and “associates sub-stroke boundaries with selected velocity minima in the handwriting input.”
U.S. Pat. No. 5,577,135 discloses the segmentation of strokes at Y-extrema, resulting in a series of up- and down-strokes that are used in a Hidden Markov Model (HMM) recognition system. In another HMM recognition system described in U.S. Pat. No. 5,878,164, strokes are “segmented into letters or sub-character primitives according to defined boundary conditions such as pen ups and cusps”.
The prior art references each attempt to introduce new techniques to address the problems of recognizing handwritten input text. Each may offer an improvement, but none offers a robust system which addresses all the previously described problems.