1. Technical Field
The invention is related to the field of optical character recognition systems employing neural networks to recognize machine-printed alphanumeric characters in any one of a plurality of fonts.
2. Background Art
Optical character recognition requires that each character on a document be correctly associated with an appropriate symbol in a predetermined alphanumeric symbol set. It is analogous to pattern recognition in the sense that the character on the document constitutes an image pattern which must be recognized as a particular alphanumeric symbol. Pattern recognition systems are well-known and are disclosed, for example, in U.S. Pat. Nos. 3,192,505; 3,267,439; 3,275,985; 3,275,986; and 4,479,241. Such pattern recognition systems are not particularly suitable for coping with the problems inherent in recognizing alphanumeric characters. These problems will be discussed below.
A related technique is neural networking, which is described in Caudill, "Neural Networks PRIMER," AI Expert, June 1988, pages 53 through 59 and in Rumelhart et al., Parallel Distributed Processing, Volume 1, pages 318 through 330. Using a neural network to recognize digits (numeric characters) was proposed by Burr, "A Neural Network Digit Recognizer," Proceedings of the 1986 IEEE International Conference on Systems, Man and Cybernetics, Atlanta, Ga., pages 1621 through 1625 (August, 1986). Using a neural network to recognize alphanumeric characters was proposed by Hayashi et al., "Alphanumeric Character Recognition Using a Connectionist Model with the Pocket Algorithm," Proceedings of the International Joint Conference on Neural Networks, Volume 2, pages 606 through 614 (June 18-22, 1989). The Hayashi et al. publication discloses an optical character recognition system which segments or isolates each character image on the document using histogramming and which then normalizes each character image to a standard size before transmitting it to a neural network. Further, the Hayashi et al. publication discloses that more than one type of font may be recognized using the same system. However, none of the foregoing patents and publications address the problem of what to do when the neural network makes an ambiguous or unreliable symbol selection, or, in other words, makes a selection whose "score" is fairly close to second and third choices. Moreover, none of them addresses the problem of how to recognize a character which is kerned with an adjacent character so that the two cannot be separated by histogrammic segmentation techniques. Finally, none of the foregoing patents and publications address the problem of how to recognize a character which is touching an adjacent character.
3. Problem to be Solved by the Invention
Before a neural network can recognize a character image to correctly associate it with the symbol it represents, the character image must have been separated from the images of other characters on the document and it size must be normalized--conformed--to the character image size and aspect ratio which the network has been trained to process. The separation of adjacent character images from one another is typically performed by a segmentation process consisting of simply finding a column or a row devoid of any "on" pixels lying between two regions consisting of contiguous "on" pixels. The segmentation process simply declares the two regions to be different character images separated by the column or row found to be devoid of "on" pixels. Such a segmentation technique is often referred to as "histogramming."
One problem with such segmentation techniques is that they cannot separate adjacent characters which are kerned. Kerned characters are adjacent characters not necessarily touching, one of which embraces the other. For example, in some fonts, a capital "P" will embrace a following small "e", as illustrated in FIG. 1. Although the two characters are truly separated from one another in the document image, there is no row or column between the two which is devoid of "on" pixels, as can be seen from FIG. 1a. Thus, the sementation techniques discussed above will fail to separate the two characters. As a result, the neural network will fail recognize either character.
Another problem with the segmentation techniques discussed above is that they cannot separate adjacent characters which are actually touching or conjoined. For example, a capital "L" whose base merges with a following capital "I" may look like a capital "U", as illustrated in FIG. 1b. As in the example of FIG. 1a, FIG. 1b shows that there iw no row or column which is devoid of "on" pixels, so that the segmentation technique will fail to separate the two conjoined characters and the neural network will fail to recognize either one of them.
A related problem with using neural networks to perform optical character recognition is that the network may fail to make an unambiguous symbol selection for a given character image. Such an event may be caused by kerning or touching characters, as discussed above, or by other things, such as poor document image quality. As is well-known, the neural network makes an unambiguous choice by generating a very high score at one of its symbol outputs and very low scores at all of its other symbol outputs. Whenever the neural network fails to make an unambiguous symbol choice, none of its symbol outputs has a relatively high score and in fact several of its symbol outputs may have similar scores. The problem is how to process a character image which the neural network fails to recognize, particularly where it is not known beforehand why the particular character image is not recognizable to the neural network.
Yet another problem is that a symbol set may be selected which includes very small symbols (such as commas, quote marks, etc.) which, when normalized to the size appropriate for processing by the neural network, are practically undistinguishable from an alphanumeric symbol of similar shape. Typically, a neural network is trained to recognize character images of a particular size and aspect ratio. Depending upon the font with which the document was printed or whether the document represents an enlargement or a reduction, each character image taken from the document must be normalized before being processed by the neural network so that its size and aspect ratio conform with the character image size and aspect ratio for which the neural network was trained to recognize. For example, the character image size may be 12 columns by 24 rows of binary pixels.
Still another problem is that the case (capitalized or small) of a particular symbol (e.g., c, p, w, s, x, etc.) is determined solely by the size of the symbol and therefore cannot be distinguished following character normalization. Thus, some provision must be made to correct the case of such symbols following their recognition by the neural network.
Accordingly, one object of the invention is recognize characters whenever the neural network fails to make an unambiguous symbol choice.
Another object of the invention is to sense whenever the neural network fails to make a reliable choice and to then recognize whether there are adjacent characters which are kerned.
Yet another object of the invention is sense whenever the neural network fails to make a reliable choice and to then recognize whether there are adjacent characters which are kerned or whether there are adjacent characters which are touching.
Still another object of the invention is to recognize special symbols which cannot be distinguished from other symbols following character normalization and to assign the proper case (capitalized or small) to symbols reliably recognized by the neural network but whose case cannot be distinguished following character normalization.
Yet another object of the invention is to achieve each of the foregoing objects in an order in which the least complicated tasks are performed first and the more complicated tasks are performed only after a determination of their necessity for a particular character image.