Through the use of word processors and/or other data processing and computerized office equipment, the number of paper documents, particularly forms, of one kind or another that are currently in use has simply exploded over the past few decades. At some point, the information on most of these documents must be extracted therefrom and processed in some fashion.
For example, one document that is in wide use today is a paper bank check. A payor typically fills in, either by hand or through machine, a dollar amount on an appropriate line of the check and presents the check to its recipient. The recipient deposits the check in its bank. In order for this bank to process the check for payment, a human operator employed by the bank reads the amount on the check and instructs a printer to place appropriate digits on the bottom of the check. These digits and similar electronic routing codes situated on the bottom of the check are subsequently machine read to initiate an electronic funds transfer through a banking clearinghouse from the payor's account at its bank (i.e. the paying bank) to the recipient's account at its bank (the presenting bank) and to physically route the check back through the clearinghouse from the presenting bank to the payor bank for cancellation. Inasmuch as the number of checks has been and continues to substantially increase over the past few years, the cost to banks of processing paper checks has been steadily increasing. In an effort to arrest these cost increases or at least temper their rise, banks continually attempt to bring increasing levels of machine automation to the task of processing checks. Specifically, various individuals in banking believe that if the check encoding process were automated by replacing human operators with appropriate optical character recognition (OCR) systems, then the throughput of encoded checks and encoding accuracy will both substantially increase while significant concomitant cost savings will occur. As envisioned, such systems would scan the writing or printing that appears on each check, accurately translate a scanned dollar amount into digital signals, such as appropriate ASCII words, and, inter alia, operate a printer to print appropriate numeric characters onto the bottom of each check in order to encode it.
With the ever expanding amount of paper documents in use in present day society--of which paper checks represent only one illustrative example, the human resources needed to read these documents and convert their contents into machine readable form or directly into computer data are simply becoming either unavailable or too costly to use. As such, a substantial need exists, across many fields, to develop and use OCR systems to accurately automate the process of recognizing and translating first machine printed alphanumeric characters and ultimately handwritten characters into appropriate digital data.
One technique that holds great promise for providing accurate recognition of machine printed characters in an OCR system is the use of a neural network. In contrast to traditional sequential "Von Neumann" digital processors that operate with mathematical precision, neural networks are analog and generally provide massively parallel processing. These networks provide fast and often surprisingly good output approximations, but not precise results, by making weighted decisions on the basis of fuzzy, incomplete and/or frequently contradictory input data.
Basically, a neural network is a configuration of identical processing elements, so-called neurons, that are arranged in a multi-layered hierarchical configuration. Each neuron can have one or more inputs, but only one output. Each input is weighted by a coefficient. The output of a neuron is typically calculated as a function of the sum of its weighted inputs and a bias value. This function, the so-called activation function, is typically a sigmoid function; i.e. it is S-shaped, monotonically increasing and asymptotically approaches fixed values typically +1, and zero or -1 as its input respectively approaches positive or negative infinity. The sigmoid function and the individual neural weight and bias values determine the response or "excitability" of the neuron to signals presented to all its inputs. The output of a neuron in one layer is often distributed as input to all neurons in a higher layer. A typical neural network contains three distinct layers: an input layer situated at the bottom of the network, an output layer situated at the top of the network and a hidden layer located intermediate between the input and output layers. For example, if a neural network were to be used for recognizing normalized alphanumeric characters situated within a 7-by-5 pixel array, then the output of a sensor for each pixel in that array, such as a cell of an appropriate charge coupled device (CCD), is 5 routed as input to a different neuron in the input layer. Thirty-five different neurons, one for each different pixel, would exist in this layer. Each neuron in this layer has only one input. The outputs of all of 35 neurons in the layer are distributed, in turn, as input to each of the neurons in an intermediate or so-called hidden layer. The output of each of the neurons in the hidden layer is distributed as an input to every neuron in the output layer. The number of neurons in the output layer typically equals the number of different characters that the network is to recognize. For example, one output neuron may be associated with the letter "A", another with the letter "B", a third with the letter "a", a fourth with the letter "b" and so on for each different alphanumeric character, including letters, numbers, punctuation marks and/or other desired symbols, if any, that is to be recognized by the network. The number of neurons in the hidden layer depends, inter alia, upon the complexity of the character bit-maps to be presented to the network for recognition; the desired information capacity of the network; the degree to which the network, once trained, is able to handle unfamiliar patterns; and the number of iterations, as discussed below, that the network must undergo during training in order for all the network weight and bias values to properly converge. The output of the network typically feeds a processor or other circuitry that converts the network output into appropriate multi-bit digital, e.g. ASCII, words for subsequent processing.
The use of a neural network generally involves two distinct successive procedures: initialization and training on known pre-defined patterns having known outputs, followed by recognition of actual unknown patterns.
First, to initialize the network, the weights and biases of all the neurons situated therein are set to random values typically within certain fixed bounds. Thereafter, the network is trained. Specifically, the network is successively presented with pre-defined input data patterns, i.e. so-called training patterns. The values of the neural weights and biases in the network are simultaneously adjusted such that the output of the network for each individual training pattern approximately matches a desired corresponding network output (target vector) for that pattern. Once training is complete, all the weights and biases are then fixed at their current values. Thereafter, the network can be used to recognize unknown patterns. During pattern recognition, unknown patterns are successively applied in parallel to the inputs of the network and resulting corresponding network responses are taken from the output nodes. Ideally speaking, once the network recognizes an unknown input pattern to be a given character on which the network was trained, then the signal produced by a neuron in the output layer and associated with that character should sharply increase relative to the signals produced by all the other neurons in the output layer.
One technique commonly used in the art for quickly adjusting the values of the weights and biases of all the neurons during training is back error propagation (hereinafter referred to simply as "back propagation"). Briefly, this technique involves presenting a pre-defined input training pattern (input vector) to the network and allowing that pattern to be propagated forward through the network in order to produce a corresponding output pattern (output vector, O) at the output neurons. The error associated therewith is determined and then back propagated through the network to apportion this error to individual neurons in the network. Thereafter, the weights and bias for each neuron are adjusted in a direction and by an amount that minimizes the total network error for this input pattern.
Once all the network weights have been adjusted for one training pattern, the next training pattern is presented to the network and the error determination and weight adjusting process iteratively repeats, and so on for each successive training pattern. Typically, once the total network error for each of these patterns reaches a pre-defined limit, these iterations stop and training halts. At this point, all the network weight and bias values are fixed at their then current values. Thereafter, character recognition on unknown input data can occur at a relatively high speed. In this regard, see, e.g. M. Caudill, "Neural Networks Primer--Part III", AI Expert, June 1988, pages 53-59.
During character recognition, a "winner take all" approach is generally used to identify the specific character that has been recognized by the network. Under this approach, once the network has fully reacted to an input data pattern, then the one output neuron that generates the highest output value relative to those produced by the other output neurons is selected, by a processing circuit connected to the network, as the network output. Having made this selection, the processor then determines, such as through a simple table look-up operation, the multi-bit digital representation of the specific character identified by the network.
Back propagation type neural networks of the type thusfar described have yielded highly accurate results in recognizing alphanumeric characters in a laboratory environment from a static universe of test data. However, when such networks have been incorporated into OCR systems and used for character recognition in the field (i.e. a "real world environment"), serious problems have arisen which significantly limit the recognition accuracy that has been obtained thereby.
Specifically, in a factory where an OCR system is manufactured, a set of printers exist and are typically used to generate alphanumeric characters in a wide variety of different fonts. The neural network in the OCR system is trained at the factory to recognize these specific characters. Unfortunately, once the OCR system leaves the domain of the factory and is operated at a customer location, the system will encounter a variety of new characters on which it has not trained. These new characters can illustratively arise because the customer is using a different font than any of those used to train the neural net and/or through physical differences existing between the specific typeface in a given font used to train the net and the actual typeface for the same font that appears in documents being scanned at the customer location. Physical differences can arise between the typefaces produced by two printers that use the same font due to a number of factors, such as differential wear between corresponding printing elements in these printers which can cause differences in the impressions or depictions made thereby on a print media, differing degrees of dirt or grit in the corresponding elements, non-uniformities in the ink or toner applied by the corresponding elements to the media as well as through slight visible differences, such as differing lengths of ascenders, descenders and serifs, in the corresponding characters themselves that have been implemented in the same font in these printers. Hence, owing, inter alia, to the illustrative factors described above, an OCR system situated at a customer location will likely be exposed to a dynamically changing universe of alphanumeric characters. As the OCR system experiences each new character, the neural network may well mis-recognize the character and, by doing so, generate a recognition error.
To reduce the occurrence of recognition errors that occur in the field to an acceptably low level, an OCR system operator is required to periodically train the system to the specific characters which the system is then expected to recognize. Inasmuch as the fonts themselves as well as the typeface for a common font used in several successive documents can even change from one document to the next, the system may well need to be trained on each different document. Unfortunately, continually training the system, particularly for each successive document to be recognized, consumes a significant amount of time and, as such, greatly reduces overall system throughput.
Furthermore, customer documents frequently contain corrupted characters. This corruption arises due to, inter alia, printing errors, dirt or local imperfections in the print media itself (e.g. paper inclusions that appears as a localized dark spot) that commonly occurs under field conditions. Unfortunately, this corruption, when it occurs in an input character on which the network was supposedly trained, often causes a neural network to be unable to positively recognize that character. As a result, for this character, the network will likely produce an output vector in which the difference between the maximum value and the next lowest value contained therein is rather small. In this instance, the network is producing an ambiguous result; the output vector contains a high degree of uncertainty with relatively little confidence in its maximum output value. Due to the ambiguity, the correctly recognized character may not be that associated with the output neuron having the maximum value but rather that having the next lowest value. This ambiguity is simply ignored through an output selection process predicated on a "winner take all" approach. Accordingly, in this instance, the selected output from the network will be wrong.
Hence, if the network were to be trained in the field using customer documents that contained corrupted characters, the resulting uncertainty arising in the output of the network would then skew the network to recognize corrupted characters. This skew in conjunction with a "winner take all" output selection process would likely disadvantageously increase, rather than decrease, the recognition errors that would occur during recognition of non-corrupted input characters appearing on these customer documents and therefore increase the overall recognition error rate.
Rather than allowing the overall recognition error rate to disadvantageously rise during field use, the network weights and biases are usually fixed at the factory with limited, if any, variation therein being permitted in the field. Though the overall recognition error rate of the OCR system would likely rise, from its rate occurring in the factory, during recognition of customer documents that contain corrupted characters, this rise is likely to be less than that which would otherwise occur if the network were to be trained to recognize these corrupted characters. As such, in providing what is believed to be reasonable performance, manufacturers of OCR systems have implicitly required their customers to accept a certain increased level of recognition errors during field use of an OCR system. Unfortunately, in many applications where accurate automated recognition of machine printed characters is essential, this level is still unacceptably high. Accordingly, currently available OCR systems are simply unsuitable for use in these applications.
Of course, one solution would be to employ a human operator at the OCR system during character recognition. The OCR system would flag to the operator each character that it either mis-recognized or recognized with a certain high degree of uncertainty. The operator would then examine the bit map of each such character and then supply the correct character to the system. Though the resulting overall accuracy of the OCR system would then increase, incorporating a human operator into the system would unfortunately reduce the throughput of the system dramatically and add significant cost to its operation. Since many applications of OCR systems are highly cost sensitive, incorporating a human operator into the system is simply economically infeasible.
Therefore, a specific need exists in the art for a neural network, particularly one using back propagation and suitable for use in an OCR system, that can accurately adapt its performance to dynamically changing "real world" customer input data. Such a network would provide more robust performance with greater recognition accuracy, particularly when confronted with dynamically changing input character data, than that heretofore occurring through use of neural networks known in the art. Moreover, such a network might well provide sufficiently high recognition accuracy to at least significantly reduce, if not in many instances totally eliminate, the need to incorporate a human operator into the OCR system. As such, use of such a network in an OCR system would not only increase the overall recognition accuracy but also advantageously increase the throughput of the OCR system without significantly increasing its cost. This, in turn, would permit OCR systems to be used in many applications for which they were heretofore unsuitable.