In recent years, a great deal of effort has been devoted to developing pattern recognition technology. In particular, optical character recognition (OCR) technology has potentially wide applicability in the fields of banking and handwriting analysis. A given character can be printed or written with a variety of different stroke widths, relative proportions, loop sizes, etc. Also, a character may be smeared, copied faintly, or otherwise distorted. In addition, a character may not be in anchor orientation, i.e., may not be right side up, in the image plane. Rather, the orientation of a character in the image plane may vary. OCR technology must be able to cope with these differences, distortions, and rotations, while being able to recognize characters from a known character set with a high degree of accuracy and reliability.
Neural network technology has been employed for optical character recognition applications. A good introduction to neural network architecture and its application to OCR is given in U.S. Pat. No. 5,052,043, issued to Gaborski and titled "Neural Network with Back Propagation Controlled Through an Output Confidence Measure". A brief overview will be provided here.
A neural network may be either implemented in hardware, or simulated using a digital computer. In either case, a neural network is made of one or more units. FIG. 1 is a schematic diagram of a typical unit of a neural network. The unit includes one or more inputs 2, each input having a weight factor (w1, w2, etc.). The weight factors may be implemented as amplifiers 4 having amplification factors corresponding to the weight factors. The neural network unit also includes a summing circuit 6 which receives and adds the weighted input signals, and produces an output 8 related to the sum of the input signals.
Analogies may be drawn between the structure of the unit shown in FIG. 1 and a biological nerve cell. The inputs 2 are analogous to nerve synapses or dendrites, and the summing circuit 6 and the output 8, taken together, are analogous to a neuron. Sometimes the tern "neuron", is used to refer to a neural network unit in its entirety, as shown in FIG. 1. In the present description of the invention, the term "unit", will be used.
The output 8 is shown as a graph, which represents the output signal as a function of the sum of the weighted inputs. The output signal is sometimes referred to as an activation function or activation. The output signal has a value related to the sum of weighted inputs and to a threshold value. For instance, the output could be a high value if the sum exceeds the threshold, or a low value if the sum does not exceed the threshold. Alternatively, to avoid a sharp discontinuity, the output could be "smoothed" by employing an hyperbolic tangent or sigmoid function. That is, in the vicinity of the threshold, the output has an S shape, increasing as the sum approaches and exceeds the threshold. Farther below the threshold, the output asymptotically approaches the low value, and farther above the threshold, the output asymptotically approaches the high value.
A conventional neural network architecture, shown in FIG. 2, employs three layers of neural network units, referred to respectively as a layer of input units 10, a layer of hidden units 12, and a layer of output units 14. This neural network architecture has been used in optical character recognition (OCR) applications as follows. An input image to be recognized as one of a set of known characters is represented as a rectangular array of pixels, each pixel having a brightness value. The input units 10, making up the input layer of the neural network, are equal in number to the number of pixels. Each input unit has one weighted input, coupled to receive an input signal from a respective pixel. The input signal is related to the brightness of the pixel.
Each hidden unit 12 has a plurality of weighted inputs, coupled to the outputs of respective ones of the input units 10. The number of hidden units 12, the number of inputs per hidden unit 12, and the which particular input units 10 are coupled to the respective hidden units 12, are all factors to be determined by the network designer, based on criteria such as the degree of variation between foreseeable images of a given character to be recognized.
Each output unit 14 also has a plurality of weighted inputs, coupled to outputs of various subsets of the hidden units 12. The number of output units 14 is equal to the number of different characters the network is to recognize. For instance, a network for recognizing the digits 0 through 9 would have ten output units, which respectively correspond to the ten digits to be recognized.
The output of each of the output units 14 is an analog value which ranges between the low and high values shown in the output 8 of FIG. 1. For a given input image, each output unit 14 will produce an output signal value. If one of the output units provides an output signal noticeably higher than that of the other output units 14, then the network is said to have recognized the input image as being the character corresponding to the output unit producing the high signal.
It will be seen that the neural network architecture just described is a general purpose architecture, useful in a wide variety of applications, such as recognition of a wide variety of character sets or images. A given neural network recognizes a given character set by virtue of the particular weight values set for the inputs to the various units in the three layers. The weight values are set during a "training" phase of operation of the network.
Training is an empirical process in which various known reference images are applied to the inputs of the input units, and the signal values at the output units are observed. A given input image is known to represent a given one of the characters to be recognized. The weight values are adjusted to increase the output signal for the output unit corresponding to the given character, and to decrease the output signals for the other output units. This adjustment of weight values based on observed output values is called back propagation.
Back propagation is performed for many different images representing each of the characters to be recognized. This training process generally requires a large number of iterations, including repetitions of previously employed images after the weight factors have been adjusted from back propagation with other images.
Eventually, weight values for all the inputs of the input, hidden, and output units are found which provide satisfactory recognition of all the input images used in the training. These weight values enable the neural network to recognize images of the desired characters which are within a desired degree of deviation from the images used in training.
However, if the references images used in the training as described above only show upright characters, then the network may be unable to recognize images of rotated characters. This inability is a drawback which would limit the usefulness of neural network OCR systems. On the other hand, if rotated character images are used for training, the duration and difficulty of the training phase are greatly increased.
Conventional systems have attempted to overcome the problem of recognizing rotated images of characters by employing a polar arrangement for receiving the input image. For instance, in Lee et al, "Translation-, Scale-, and Rotation-invariant Recognition of Hangul Characters with Transformation Ring Projection", International Conference on Document Analysis and Recognition ICDAR-91, pp. 829-836, there is described an OCR scheme in which a polar coordinate system is superimposed on an image to be recognized, and a histogram of black pixels is created as a function of radial distance from the center of the image. However, this scheme is not satisfactory for many OCR applications, particularly with regard to recognizing handwritten characters. Different individuals may write a given character using pen strokes varying distances apart. Accordingly, for a given character, corresponding strokes written by different people might be tangent to two different concentric circles of the polar coordinate system. A substantial difference in histogram values would result, leading to a high likelihood of erroneous character recognition.
Another conventional OCR scheme for recognizing rotated characters is described in Taza, et al, "Discrimination of Planar Shapes Using Shape Matrices", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 19, No. 5, September/October 1989, pp. 1281-9. A polar coordinate system is superimposed on an image to, be recognized, and a shape matrix, defined in terms of radius and rotational angle, indicates whether corresponding elements of the image are white or black. Thus, the greater the radial value, i.e., the farther away a given element of the image is from the center of the polar coordinate system, the larger the area of the image element. On the other hand, the image elements near the center of the image are small. As a consequence, a slight variation near the center of the image, caused by foreseeable variations in handwriting or printing fonts, will likely cause a substantial variation in the values produced in the shape matrix. Accordingly, this conventional arrangement also has a high likelihood of erroneous character recognition.
Persoon et al, "Shape Discrimination Using Fourier Descriptor", IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-7, No. 3, March 1977, pp. 170-9, discloses a method for recognizing boundary curves or shapes. Fourier descriptors describing closed curves having no discontinuities may be defined (equation 2). A suitable starting point on a closed curve is selected, and a distance between an unknown sample and a nearest sample in a training set is calculated for curve matching. However, this method has the disadvantage that some of the details of the image to be recognized, such as inner loops, are lost in the course of obtaining the Fourier descriptors. As a consequence, this method also provides an undesirably high recognition error rate.
Hu, "Visual Pattern Recognition by Moment Invariants", IRE Transactions on Information Theory, February 1962, pp. 179-87, discloses a method for determining moments of a density distribution function. An image of a character to be recognized is characterized as a density distribution function, and moments are calculated therefrom. The image is recognized as one of a set of characters by comparing the calculated moments with known moments for the character set. Methods are described for achieving orientation independence. However, this method also provides an undesirably high recognition error rate. Additionally, the method requires a great deal of complicated and time consuming calculations.
In summary, conventional arrangements for recognizing images of rotated characters have not been able to recognize characters, particularly rotated characters, with a desirably high rate of accuracy.