Automatic handwriting refers to the process of identifying handwritten words or characters in response to an input signal (e.g., character input) from an electronic surface. Sometimes, modern recognition systems cast this process as an instance of information transmission over a noisy channel, which can be described by a statistical framework as the example of FIG. 1. As shown in the example of FIG. 1, W refers to the sequence of words or characters intended to be written (e.g., by a user who inputted the signal on the electronic surface), S to the chirographic realization of that sequence via one or more strokes, and Ŵ to the hypothesized written sequence output to the user. “Handwriting production” and “handwriting recognition” can be understood as variants of the usual information-theoretic terminology of “encoding” and “decoding.” Blocks 102 and 104 are assigned a set of parameters of the form Pr(·|·), symbolizing the instantiation of a stochastic process characterized by a statistical model, for example. These parameters are typically trained from a corpus of exemplars using machine learning techniques, for example. Besides handwriting recognition, the framework of FIG. 1 can also apply to the context of signature verification, gesture recognition and biometric recognition.
An important part of this process is the choice of representation adopted to convey the chirographic evidence S, directly reflecting the type of information extracted from the input signal. Two prominent categories of information include temporal information, which preserves the sequential order in which sample points are captured by the electronic surface, and spatial information, which represents the overall shape of the underlying word or character regardless of how it was produced. Typically, handwriting recognition systems process temporal and spatial information separately, and then combine the respective probability scores from the statistical model for the temporal information and the statistical model for the spatial information. However, combining the separately determined spatial information and temporal information probability scores does not allow for the joint optimization of the two types of information.