The present invention relates generally to front-end processing and storage of handwriting data in handwriting training and recognition systems, and more particularly to a handwriting signal processing front-end that utilizes non-uniform segmentation, feature exaction and multiple vector quantization.
Today, the use of pen-based computer systems that allow a user to interface with the computer through a pen rather than a keyboard are becoming widespread. A common type of a pen-based computer system, for example, is a hand-held personal computer called a personal digital assistant (PDA). Typically, a PDA comes equipped with a specialized writing pen. As shown in FIG. 1, a user interfaces with a pen-based computer 1 by writing on a digitizing tablet 3 using a special stylus or pen 5. When the pen 5 is placed near or on the digitizing tablet 3, the tablet 3 generates a series of x- and y-coordinates, called sample points, that represent the path of the pen 5 as it moves across the tablet 3. FIG. 2 is a graphical example of sample points for a cursive handwriting sample that was written on a pen-based computer. The stars in FIG. 2 indicate sample points of the (x,y) coordinates of the pen taken at uniform time intervals, and the units on the x and y axis on the graph represent actual addresses on the tablet surface.
The (x,y) coordinates are commonly represented as either stroke-based data or image-based data. Recording the (x,y) coordinates at uniform time intervals creates stroke-based data. Image-based data is not time stamped, but rather each pen stroke is recorded as an image using only the (x,y) coordinates. Besides the time and the (x,y) coordinates, other values may also be recorded, such as a value denoting whether the pen 5 was in one of two states: up from the tablet surface (in which case the coordinates of the pen would not be determinable), and contacting the tablet surface. The series of recorded coordinates (and any other values) are typically referred to as sample points, and a series of words that have been recorded in such a manner are referred to as digitized handwriting samples.
After the sample points are generated, specialized handwriting software within the pen-based computer system attempts to recognize the series of sample points as known characters, symbols, etc. The handwriting software is usually comprised of two main components; a front-end processor that functions to characterize and reduce the amount of data contained in the digitized handwriting samples, and a recognizer which performs the actual recognition. The recognizer may be implemented using various methods, such as template matching and training-based recognition.
Template-matching recognizers attempt to match each character written by the user by comparing the character, pixel-by-pixel, to pre-made character templates that are stored in memory. In contrast, training-based recognizers do not utilize pre-made templates. Instead, training-based recognizers are operated in two modes, training and recognition. In the training mode, training-based recognizers are trained by statistically analyzing sets of training handwriting samples to develop statistical models of letters or words. The models allow for variability in the way letters and words can be written. These models are subsequently used in the test mode for recognition of test handwriting samples. Since training-based recognizers incorporate the statistical variance found in the training handwriting samples to recognize the test handwriting samples, training-based recognizers generally have a higher rate of recognition accuracy over template-matching recognizers.
To properly train a training-based recognizing device, a large amount of representative data is required to provide adequate statistical data. For example, recording 950 written words from 100 writers results in approximately 23 million coordinate pairs. (See David E. Rumelhart, Theory to Practice: A Case Study--Recognizing Cursive Handwriting). Because of the large number of coordinate pairs involved, a data reduction procedure is required before the coordinates can be used for training and recognition. The front-end processor is used to characterize the coordinate data in a way that retains the essential information about the handwriting, while reducing the amount of data presented to the training and recognition system.
Commonly, front-end processors produce a reduced data set from the original coordinate data by performing segmentation and feature extraction to produce what is called feature vectors. Segmentation refers to the process of partitioning the coordinates representing the path of the pen into separate groups of contiguous coordinates called segments. In general, there are two types of segmentation processes, uniform segmentation and non-uniform segmentation.
Uniform segmentation typically defines individual segments as a fixed number of (x,y) coordinates along a stroke, or according to a fixed distance along a stroke. Segments formed during uniform segmentation are usually independent of local features of the data. Unlike uniform segmentation, non-uniform segmentation defines segments based on some particular feature of the data, such as defining the end point of a segment to be where the pen changed vertical direction during writing. This results in segments that are formed by grouping the coordinates in each up-stroke and each down-stroke of the pen into separate segments. Referring again to FIG. 2, points 7, 9, and 11 shown on the letter "a" are examples of segment endpoints in the word "act". Segment endpoint 7 is the initial starting point of the pen during the first upstroke formed in the letter "a"; segment endpoint 9 is the transition point between the first upstroke and a downstroke; and segment endpoint 11 is the transition point between the downstroke and a second upstroke. An example of one segment comprising the letter "a" is segment 13, which is described by a list of those coordinates in the letter "a" located between segment start point 9 and segment endpoint 11. Halfway point 17 is the point which is located halfway between the start point 9 and the endpoint 11 of segment 13. The non-uniform segmentation process usually results in a series of segments that have unequal length.
After the segments are formed, the process of feature extraction is used to analyze the series of coordinates within each segment to derive "features" from each of the segments. Examples of features include the speed of the pen at the end of the segment, the net distance between the endpoints of the segment in the x-direction, and the net distance between the endpoints of the segment in the y-direction. Each feature extracted has value, for example the net distance in the x-direction for a particular segment may have a value of four pixels. The feature values taken from a segment are then grouped to form a feature vector for that segment.
After the front-end processor generates feature vectors from the coordinate data, a process called vector quantization is used to reduce the representation of segments from feature vectors to an even more compact form. Vector quantization is a mathematical process that is widely known and used in fields as image processing, telecommunications and speech recognition. The input to vector quantization is a multidimensional vector, called an input vector. Vector quantization statistically analyzes the data contained in the input vector to: identify how the values of the input vectors cluster or group together; determine the mean locations of the input vectors in each cluster; and determine the distribution of the vectors about the mean. A symbol is assigned to each cluster to identify the clusters, and this information is stored in what is called a codebook.
An input vector may be formed from only one feature vector or from a combination of feature vectors. When an input vector is formed by combining a group of separate feature vectors, the input resembles a matrix, where each column of the matrix is an individual feature vector. Whether or not the input matrix is comprised of one or many vectors, when vector quantization is performed to generate a codebook for the one input matrix, it is termed single vector quantization. However, if the input matrix is partitioned into sets of vectors, and vector quantization is performed to generate separate codebooks for the sets of vectors, it is termed multiple vector quantization.
In the field of handwriting recognition, signal processing front-ends utilize uniform segmentation in combination with single vector quantization to model the handwriting data. In single vector quantization, all the feature vectors formed from the coordinates generated from a stroke of the pen are combined to form an input vector. Vector quantization is then performed on the input vector to calculate statistics for the data contained in the input vector, such as the mean and the standard variations from the mean.
The problem with single vector quantization is that it results in a data space that has an extremely high data dimensionality. Every feature extracted from the sampled data creates what is called a dimension in the data space. For instance, if speed of motion in the x-direction is the only feature extracted from a segment, then the resulting data space is one dimensional. If the net x-distance is also extracted, then the data space is two dimensional. The data space refers to the region that encompasses the range of values that a particular feature value may have. For example, the values for speed-of- motion in the x-direction mentioned above may only range from zero to ten. These range of values between zero and ten forms the data space for the speed-of-motion feature.
In single vector quantization, the input vector usually has a high dimensionality due to the number of feature vectors involved. For example, assume that twenty feature vectors are formed from a particular handwriting sample and that each feature vector contains fourteen feature values that represent fourteen different physical aspects of the data (i.e. a fourteen-dimensional data space). Combining all twenty fourteen-dimensional feature vectors results in an input vector having a 280-dimensional data space (20*14=280).
During the training stage, the vector quantization process finds the statistical relationships formed by the distribution of the input vectors. A large amount of input vectors are required to fill the data space before the vector quantization process can generate a reliable estimate of the distribution of data in the data space. The ratio of the available number of input vectors to the dimensionality of the input vectors is referred to as data resolution. Since in a typical handwriting sample, only a finite amount of data is available, increasing the dimensionality of the data space results in reduced data resolution and diminished recognition accuracy. This phenomenon is known as the "curse of dimensionality."
The following example is provided to illustrate vector quantization and the principal of the curse of dimensionality. Assume that only one feature, speed of motion of the pen in the x-direction, has been extracted from one hundred data points of a particular handwriting sample. Assume further that one hundred speed values have been calculated from the sampled data and that the speed values range between zero and ten (the units are not important). During quantization, the many different speed values are replaced with a smaller representative set of numbers by grouping the values together into bins or ranges of values, and replacing each original speed value with the mean speed value of that bin.
The one-dimensional data space for the speed values may be partitioned into ten bins; one bin for values between zero and one, one bin for values between one and two, and so on, as shown in FIG. 3A. The dots in the example represent where the values fell in each bin and are raised from the horizontal axis for purposes of illustration. Since one hundred data values are available, in the case of a fairly uniform distribution ten values may be expected to fall within each bin.
In this example, ten values per bin during quantization may be sufficient for estimating the mean value position for each bin. As shown in FIG. 3B, the mean value for each bin is calculated and each speed value that fell within a particular bin is assigned that mean value. If, for example the mean value for bin four is 3.7, then every value in bin four would be assigned the value of 3.7.
If the data space is expanded to include two feature values then the data space increases to two dimensions. The second feature extracted could, for example, represent the acceleration of the pen in the x-direction for each data point. This two dimensional data space is depicted in FIG. 4. The number of bins in the data space now increases to 10.sup.2. Since there is a fixed amount of data points available, one hundred in this example, on average only one data point per bin is available. If the number of dimensions increases to three by adding a third feature value, the number of bins increases to 10.sup.3, or one thousand, and only one-tenth of a data point is available per bin, in which case most of the bins are empty.
As the number of dimensions increases, the number of bins increases exponentially, and with only a fixed amount of data available most of the bins in the data space become empty. For bins with few or no data points, the statistics representing those bins are poor, i.e., the standard deviation of the mean value in the bin is large or indefinite. Obviously, having a high input dimensionality results in a poor estimate of the distribution or statistical property of the data that is input to the vector quantization process due to the low data resolution.
Currently, only very specialized methods are capable of performing single vector analysis on vectors having dimensionalities greater than thirty. Because of the costly computations involved, applications having high data dimensionality are not suited for real-time recognition applications. In addition, applications having high data dimensionality fail to effectively reduce the amount of data in the original data set which results in increased memory requirements.
Accordingly, an object of the present invention is to provide an improved handwriting signal processing front-end for use with handwriting recognition systems which increases data resolution, effectively reduces the dimensionality of the input data, and results in greater recognition accuracy.
A more specific object of the present invention is to provide an improved handwriting recognition system which incorporates non-uniform segmentation, feature extraction and multiple vector quantization.
Additional objects and advantages of the invention will be set forth in part in the description which follows, and in part become apparent to those skilled in the art upon examination of the following, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the claims.