Character recognition, the classification of well formed and cleanly segmented characters, is well known in the prior art. One problem in the prior art of character recognition is automated segmentation where automated segmentation is understood to include the separation of text images into individual letters, one letter per image. Without this essential component, character-based classifiers can not function.
In prior art character recognition systems the segmentation of the characters is one of the major sources of error. Accuracy for segmentation in these prior art systems is well over ninety-nine percent for machine print, approximately ninety percent for hand print, and zero for connected handwritten script. Thus, in order to improve the performance of character recognition systems, it is useful to increase the segmentation accuracy for handprint and make it possible to perform the type of segmentation required for segmenting connected handwritten script. Additionally, in the field of document imaging systems, it would be very useful to have a system which makes use of off-line images rather than on-line strokes for improving the performance of segmentation during the recognition of handwritten script.
A common prior art method of handprint segmentation uses histograms to separate the space between letters with vertical lines. This method is about sixty percent effective on handprint. This method can be made ninety percent effective by two modifications. Statistical rules such as rules involving the expected width of a character or the expected aspect ratio of a character may be added to detect bad segmentations. Additionally, bad segmentation may be corrected by using the best possible straight line, not necessarily vertical, to separate the characters.
In conventional character recognition processes both the positive image of the character and the negative image of the space around the character are available as guides in segmenting and identifying characters. In a similar manner an idealized character segmenter can find the best curve through the negative space between characters to separate characters. If a large number of these curves is previously learned, then characters which are touching, such as handwritten script, could be separated because the resulting negative space will be recognized as being more like a space and less like a character.
Another major problem in the prior art character recognition processes is that a large number of anti-objects must be learned. For example, if each object to be recognized can be followed by any of the remaining objects, n.sup.2 anti-objects must be learned for n classes. This O(n.sup.2) problem made these processes very cumbersome.
A model recognition system has been implemented on a massively parallel computer at The National Institute of Standards and Technology. The system consists of eight functional components. The loading of the image into the system and storing the recognition results from the system are I/O components. In between are components responsible for image processing and recognition. The first image processing component is responsible for image correction for scale and rotation, data field isolation, and character data location within each field. The second performs character segmentation. The third image processing component does spatial normalization.
Three recognition components are responsible for feature extraction and character reconstruction, neural network-based character recognition, and low-confidence classification rejection. Studies have shown that traditional image processing techniques used for character segmentation, even when implemented on a parallel computer, require fifty five percent of the system's processing time at a rate less than eight characters per second. A form containing one hundred thirty handprinted characters requires seventeen seconds of processing just for character segmentation using histograms. This is much longer than the one second per page throughput required by many automated document processing applications. In order to improve segmentation, alternative methods are being explored.
It is also known in the art to perform self-organizing pattern recognition. Self-organizing pattern detection and matching using a system having multiple associative memories is taught by Charles L. Wilson in "Multiple Memory Self-Organization Pattern Recognition Network", U.S. patent application Ser. No. 07,701,484, filed on May 16, 1991. The multiple associative memories of the system of Wilson are able to learn the patterns in a sample of data without prior knowledge of the classes. This prior art self-organizing detection system learns a pattern by means of a feed-forward adaptation using symmetric triggering. Input to the pattern detection system is applied directly to each of the associative memories. The associative memories produce a best match pattern output for each class while simultaneously smoothing and generalizing the input data. Thus the need for prefiltering of data is eliminated. Each memory may contain data of a different pattern type. These matched patterns are reduced to a match signal strength. A match strength signal is produced for each applied signal and each associative memory. All matched strength signals in the system of Wilson may be computed in parallel.
The matched strength signals of Wilson are processed to produce a logical-type signal for each memory. The logical match signals are combined to provide a logical learning trigger signal which allows acceptable patterns to be used to update a specific associative memory. It is also permitted in the system of Wilson for each associative memory to be updated by a separate learning method. The architecture provides multi-map, self-organizing pattern recognition which allows massively parallel learning features using different maps for each feature type.
The method taught by Wilson is thus similar to the multi-map structures believed to exist in the vertebrate cerebral cortex. The technique taught thus consists of sets of associative memory locations, one for each feature type, in which learning is symmetrically triggered by logical combinations of the associative strengths of the memory blocks. Each map is independent of the others except for the connections used to trigger learning. The learning used to update memory locations uses a feed forward mechanism and is self-organizing and stable.
It is also known in the prior art to perform segmentation using a more traditional neural network architecture known as a multi-layered perceptron network. This type of neural network classifies images by generating feedforward activations across a fully connected network containing an input layer, one or more hidden layers, and an output layer. Supervised training may be done using scaled conjugate gradient learning or back propagation. Using the multi-layered perceptron architecture trained with Gabor feature vectors, character recognition accuracy of 99.8% for medium quality machine print has been demonstrated in M. D. Garris, R. A. Wilkinson, C. L. Wilson, "Methods for Enhancing Neural Network Handwritten Character Recognition," International Joint Conference on Neural Networks, Vol. I, pp. 695-700, Seattle, 1991.
Gabor functions such as those in the system of Garris are a set of incomplete nonlinear functions which reduce random image noise and smooth irregularities in image structure by acting as spatially localized low-pass filters. Gabor functions provide the minimum combination of uncertainty in position and spatial frequency resolution, and they match the visual receptor field profiles of mammalian eyes. See J. G. Daugman, "Complete Discrete 2-D Gabor Transform by Neural Networks for Image Analysis and Compression,", IEEE Trans. on ASSP, Vol. ASSP-36, pp. 1169-1179, 1988 for further information on these functions.
These Gabor functions may be used in two different ways. Gabor reconstructed characters are enhanced by emphasizing the body of the character, reducing both the variations along its edges due to digitization and by normalizing its stroke width. These functions can also be used to create feature vectors for multi-layer perception networks.