Symbols, such as handwriting, traced on electronic tablets are usually represented by sequences of x-y coordinate pairs. A fundamental unit of handwriting is the stroke wherein a stroke can be considered as a sequence of points, represented by their respective x-y coordinates, which are generated between a tablet pen-down and pen-up motion of the writer. Characters are collections or sets of such strokes. By example, in FIG. 1a there are illustrated various numbered sequences of strokes which generate the printed characters A, B, C and D. Data processing systems which are operable for identifying a set of such strokes as a particular alphanumeric character are known and it is to these types of systems that the invention is directed.
In such a character identification system an important processing step is known as segmentation. Segmentation involves pre-processing, prior to processing by the character recognizer, the stroke input data to partition the strokes of a character, a gesture or possibly a word.
In that the strokes which make up a given character are usually interconnected it is the accurate determination of the stroke interconnectivity which is important for the successful operation of a segmentation pre-processor. That is, a successful segmenter must be able to accurately determine groups of connected strokes.
When pre-processing on-line data, that is processing data in a real-time manner as the writer writes it, some conventional segmenters employ temporal information to detect groups of connected strokes. This temporal method is based on an observation that writers tend to pause between words. However, this type of temporal information can be unreliable. Also, temporal data is sometimes not available for tablet input data which has been previously stored or buffered prior to input to the segmenter.
One conventional spatial approach to segmentation is to detect overlapping parts of strokes along the x-axis of the tablet. Essentially, the vertical projections of strokes are checked for intersections. This method can yield reasonably accurate results when segmenting discrete or block writing where characters are well spaced. A disadvantage of this method is that when characters are not well spaced their projections onto the x-axis often overlap even if the characters themselves do not intersect. As illustrated in FIG. 1b, the horizontal stroke of the capital "T" projects onto that portion of the x-axis occupied by the lowercase "o". This condition can result in a segmentation error.
Another spatial approach is two-dimensional and computes the distance between every pair of points (p1,p2) where p1 belongs to stroke 1 and p2 to stroke 2. By finding the smallest of these distances an algorithm determines whether two strokes intersect one another or come so close to one other as to be considered connected. One considerable disadvantage of this approach is that it is computationally intensive, especially in those applications where the average number N of points per stroke is large. For each pair of strokes it requires on the order of O(N.sup.2) to find the shortest distance between two points. This can result in poor computational performance and may preclude real-time applications. Another disadvantage of this approach is that it can be "fooled" by a fast writer wherein the number of points per stroke is reduced and the distance between adjacent points is increased. If two rapidly drawn strokes cross, computing the distance between all pairs of points may not locate the intersection due to the distance between adjacent points.