1. Field of the Invention
The present invention relates to a system, process and software arrangement for recognizing handwritten characters, and more particularly to recognizing the handwritten characters on-line using, e.g., character segmentation techniques.
2. Background Information
Currently, technologies such as microcomputers, word processors, fax machines and electronic mails utilize electronic handwriting recording and recognition techniques to enable a useful and versatile communication between such devices. In particular, conventional computer-based handwriting analysis methods include a recognition and interpretation of characters, as well as a verification of the handwritten data.
The known electronic handwriting recognition procedures generally transcribe a message, represented in a spatial form of graphical marks, into a computer text, e.g., into a sequence of 8-bit ASCII characters. These handwriting interpretation techniques generally determine the most likely meaning of a particular handwritten text, e.g., a mailing address written on an envelope. The handwriting verification techniques are used to determine whether the handwritten text belongs to a particular individual, and can be used in, e.g., the forensic investigations.
The handwriting recognition techniques can be considered as being in two separate categories—on-line recognition and off-line recognition.
The on-line handwriting recognition techniques are generally used with a transducer/input device is connected to a computer which is available to the user. One such arrangement is shown in FIG. 1, which illustrates an on-line handwriting recognition system 5 that can also be used with conventional techniques and techniques according to the present invention. The transducer/input device converts the user's writing motion into a sequence of signals, and sends this signal information to a computer 50. The computer 50 generally includes a handwriting recognition system. An exemplary transducer can be a tablet digitizer 10. This tablet digitizer 10 generally includes a plastic or electronic pen 15 and a pressure or electrostatic-sensitive writing surface 20 on which the user provides the handwritten information using the pen 15. By sampling or tracking the movement of a tip of the pen 15 on the writing surface 20, the tablet digitizer 10 is able to detect certain information when the pen 15 is in contact with the writing surface 20 e.g., the x and y coordinates of a sampled point on the writing surface 20, providing information indicative of whether the pen 15 touches the writing surface 20 (“pen-down state” or has been removed therefrom—“pen-up state”, etc.). The information is transmitted to the connected computer 50 for recognition processing by the handwriting recognition system. A “stroke” in the data in the “on-line” recognition system can be defined as a sequence of sampled points from the pen-down state to the pen-up state of the pen 15. Thus, the completed writing of a word would likely consist of a sequence of one or more strokes. The tablet digitizer 10 then captures the temporal (dynamic) data of the word when it samples the points on the contours that the user is forming.
The off-line handwriting recognition techniques are generally related to the field of Optical Character Recognition (“OCR”). In contrast to the on-line handwriting recognition techniques, these off-line techniques are not interactive. In the exemplary OCR system, a machine-printed material is scanned into a computer file in two-dimensional image representation using a scanner. Then, the off-line handwriting recognition technique of this conventional OCR system attempts to recognize the scanned handwritten data.
One of the benefits of the on-line handwriting recognition techniques which set them apart from the off-line handwriting OCR or other image recognition techniques is their ability to utilize the temporal and dynamic input sequence information which is provided directly by the user in real-time. This dynamic information obtained by the on-line handwriting recognition techniques provides a vivid separation of the foreground from the background, and thus can bypass the pre-processing procedures that are required to be performed by the off-line handwriting recognition techniques. Also, the obtained on-line dynamic information is generally more compact than the off-line information because of the different dimensionalities in representation. The difference in the data size also leads to the difference in the processing time.
Another advantage of the on-line handwriting recognition techniques is their use of the sequence information of the data received thereby which allows the character boundary segmentation easier to be performed. After the preprocessing stage, most handwriting recognition systems and methods attempt to separate their received data into intervals/segments (which correspond to hypothetical characters), and apply an evaluation process to such intervals/segments. The recognition performance of such system and process is substantially dependent on the quality and robustness of the character segmentation. Due to the cues available from the temporal ordering of its input data, the on-line handwriting recognizer may generate the segmentations in a reliable and efficient manner. For example, when the two neighboring characters overlap in the respective occupying regions, it is significantly more difficult for an off-line recognition system and method to segment such characters correctly. This is because any simple geometric separation would likely contain a portion of at least one of the characters. Using the on-line handwriting recognition system, it would be easier to handle the above-described scenario.
As known to those having ordinary skill in the art, the handwriting recognition systems (whether on-line or off-line) are designed to support three different styles, i.e., a printed style, a cursive style and a mixed style. Recognizing the printed style of handwriting is, most likely, simpler than recognizing other handwriting styles. This is because each character of such style has clearer boundaries with its neighboring characters. For example, the characters in the printed style are usually separated by the “pen-up” signal in the on-line handwriting recognition system. In recognizing the cursive handwritten script, however, most of the component characters are connected to their neighbors by a sub-stroke (i.e., a “ligature”) which is not a part of any character or letter, but only a connecting pattern between two characters/letters. In this situation, it is more difficult to hypothesize about the character segmentation since there is less information regarding the likely segmentation boundaries of each character. Handwritings having a printed style can be regarded as a subset of the cursive mode recognition, and the mixed mode can be obtained as a by-product of obtaining both printed and cursive modes support. Therefore, one having ordinary skill in the art would understand that it is the hardest task to recognize characters in the cursive mode.
Conventional handwriting recognition systems and methods can be writer-independent or writer-dependent. For example, writer-independent systems can handle the idiosyncrasies of user's writing styles, and writer-dependent systems are trained to recognize a single user's writing style. It is possible to have the same character (or a class of character) written in different ways, e.g., so that they are in different subclasses or allographs. Therefore, each character class usually consists of one or more subclasses. Correctly identifying a good set of allographs is a challenging task which requires a recording of a huge number of samples, which usually cannot be done by the conventional systems and methods. Also, a larger number of subclasses/allographs would require additional time for processing for such conventional systems which would not be preferable, especially when using an on-line character recognition system or method.