An object recognition system, such as a system to recognize handwriting, typically requires separate development of a number of components. These components include a segmentation system to isolate the object, or character, from its background, a size normalization system to remove irrelevant variations, a feature extraction system to identify the features that discriminate the object from others, a classification system, and a system to improve classification accuracy by incorporating contextual constraints. One technique that can integrate a number of these steps is backpropagation neural networks, as described in D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning Internal Representations by Error Propagation" (in D. E. Rumelhart & J. L. McClelland, Parallel Distributed Processing, Vol. 1, 1986.
Application of backpropagation techniques to handwriting recognition initially focused only on the classifier component. In these type of systems, the input to the net is a feature vector and its target output vector, with the network doing the job of partitioning the feature space to enable classification. There have been a number of demonstrations that backpropagation techniques can also take over the job of feature selection and extraction. In these systems, the input to the network is a pre-segmented, size-normalized character array and its target output vector. The network jointly optimizes feature selection and classification, and thereby frees the system developer from time-consuming, iterative development efforts caused by experimenting with different feature sets. There is also some indication that generalization accuracy is relatively robust across variations in the size of the input image, and variations in the nature of the architecture. This work suggests backpropagation learning can handle a large range of variability in the input and therefore that this technique can be applied to integrating more of the heretofore separate components of recognition.
An input representation that can be used for this integration is the scanning window technique that has been applied in the domains of speech synthesis and speech recognition. In the speech domain an input window scans a signal that varies over time and enables the immediately adjacent context of a speech object to affect its classification by the system. One system is that described in T. J. Sejnowski and C. R. Rosenberg, "NETtalk: a parallel network that learns to read aloud", The John Hopkins University Electrical Engineering and Computer Science Technical Report, pages 663-672 (1986). This speech synthesis system is based on an automated learning procedure for a parallel network of deterministic processing units. After training on a corpus of informal continuous speech, it is capable of capturing most of the significant regularities in English pronunciation as well as absorbing many of the irregularities. Another similar technique used for speech recognition is described in A. Waibel, H. Sawai, K. Shikano, "Modularity and Scaling in large phonemic Neural Networks", Advanced Telecommunications Research Institute Technical Report II-0034, (1988).