Automated text recognition involves using digital computers to recognize letters and digits from a predefined "alphabet" of templates. A typical automated text recognition system measures the similarity between a sample to be recognized and each of the predefined templates with a symmetrical mathematical measurement. Despite continuous research efforts for almost four decades, the performance of known automated text recognition systems generally does not compare to the capabilities of a human in recognizing text such as cursive script which typically varies in many respects including size and style.
Traditionally, two approaches have been used for automated text and/or pattern recognition: a statistical approach and a linguistic approach. The general idea of both approaches is to select a set of measurements called features (e.g., the density of black pixels, the number of loops in the character to be recognized, the type and position of a stroke) and to implement a set of decision rules which constitute a classifier. Because of variability across samples of the same pattern class, the features generally are unknown and are thus modeled as random variables. These random variables define a feature space. In the statistical approach, the feature space is partitioned according to the set of decision rules into regions corresponding to different patterns (e.g., one region for A, another region for B, etc.). Given an unknown sample pattern to be recognized, the procedure in the statistical approach is to: extract a vector of features; determine the region to which it belongs; and assign to the pattern a label for that region. In the linguistic approach, a pattern class is considered to be a set of features generated by, for example, a non-deterministic finite state machine, a Markov process, or a push-down automata. Given an unknown sample pattern to be recognized, the procedure in the linguistic approach is to: extract the set of features; determine the machine which generated it; and label the unknown pattern accordingly.
A main difference between the statistical and the linguistic approaches lies in the structure of the classifier which, in a sense, is determined by the definition of a character. In the statistical approach, classifiers include nearest mean classifiers, Fisher classifiers, neural network classifiers, and nearest neighbor classifiers. In the linguistic approach, classifiers include machine matching classifiers. In general, the statistical approach is older than the linguistic approach and is typically used for recognizing relatively simple patterns such as characters. The linguistic approach generally is preferred for recognizing more complex patterns such as three-dimensional images.
With both the statistical and linguistic approaches, it is necessary to select a useful set of features (which is sometimes termed the problem of representation or the representation problem). This requirement of both approaches can result in difficulty in implementing either approach. For instance, in a handwritten text recognition problem, selecting the pixel values of the text image as features is not a useful strategy. While the set of pixel values marks a complete representation in the sense that any other representation can be derived from it, it is not a convenient representation with which to deal. Variations in the samples of a handwritten pattern to be recognized typically cause a wide scale correlation among the features which tends to complicate the design and analysis of the classifier.
The standard paradigm of text and/or pattern recognition which involves feature extraction and subsequent classification (such as is employed in both the statistical and linguistic approaches) typically is inadequate to achieve desirable levels of recognition speed and recognition accuracy. A new and better approach to the problem of automated text recognition whereby both machine-printed and handwritten (especially cursive script) alphanumeric characters can be recognized relatively simply and quickly and with a relatively high degree of accuracy is needed.