The automated recognition of handwriting generally involves three main steps. As a first step, individual strokes that are written by a person are preprocessed and tentatively segmented into different groups which might correspond to respective characters. In the second step of the process, the pattern formed by each segmented group of strokes is analyzed to determine whether it represents a recognizable character. Typically, this step might be carried out in a neural network, which produces output values that are related to the probability that a given group of strokes represents a particular character. In the third step of the process, the various probabilities regarding different characters undergo a word recognition search to identify one or more possible words which the written strokes represent.
As part of this process, another segmentation procedure is carried out, to properly assign the various characters to respective words. Typically, this form of segmentation is determined by analyzing the geometric relationships between the various hand-written characters. Since the individual letters of a written word normally have a small spacing between them, whereas the spacing between different words is much greater, the relative spacings between characters can be used to detect word breaks. In the past, a thresholding technique was employed in the stoke preprocessing step to segment characters into different words. For example, the spaces between successive characters were measured, to derive an average, or nominal value. This nominal value was used to establish a threshold, or maximum value. If the space between two characters exceeded the threshold value, a word break was inserted between the two characters.
While such an approach may be well-suited for ideal cases, in practice it does not always produce good results, because of the wide variances between the handwriting of different individuals. When quickly jotting notes in a personal information manager, for example, an individual does not always pay close attention to the spacing between strokes. In such a situation, a fixed thresholding approach does not provide sufficient flexibility to accommodate the nuances of different handwriting styles.
Accordingly, it is an objective of the present invention to provide a technique for segmenting different groups of strokes into words, which inherently provides flexibility to accommodate irregular spacings of strokes in the recognition of words.