1. Technical Field
The invention is related to optical character recognition and in particular to the normalization of characters of different sizes to a uniform size in optical character recognition.
2. Problem to be Solved by the Invention
Optical character recognition systems which recognize printed characters generally require that the sizes of all of the characters to be recognized be the same uniform size. Quite often, however, a text which is to be processed by an optical character recognition system contains printed characters of various point sizes. In order for the system to process such a text, character normalization must be performed on each character whose size is different from the desired size in order to change each such character to the desired size. Each character image is first separated from the other images in the text prior to character recognition. Then, the normalization process changes (if necessary) the character size to the correct size. In many cases, it is necessary to scale the character image to make it "fatter" or "skinnier" relative to its original aspect ratio. Thus, it is required to control both the size and the proportion (aspect ratio) of each character prior to character recognition processing.
It is desirable to preserve the strokes in each character without loosing any due to the character normalization process. Thus, for example, if the character size is to be reduced, the number of pixels representing the reduced character will necessarily be reduced. The problem is how to reduce the image size while minimizing the amount of lost character stroke information. If the pixels in the new (reduced) image are simply taken from the pixels in corresponding locations in the original image, then the remaining pixels will be discarded and the information they represent will be irretrievably lost.
In accordance with one goal of the invention, the reduction in the number of pixels representing a character of reduced size is compensated by computing each pixel in the new (reduced) image based upon the values of a local neighborhood of pixels surrounding the corresponding location in the old image. This minimizes the information lost through a reduction in the number of pixels representing each character. This however raises the another problem, namely, how to define the neighborhood of pixels in the old image which are to be considered in computing the pixel in the new image. This problem is particularly acute where the character size as well as its shape (aspect ratio) must be changed. One solution that may be tried is to define the neighborhood as a rectangle whose proportion reflects the ratio of the horizontal and vertical scale factors by which the character size must be reduced. Such an approach has been suggested, but not particularly for normalizing individual character images, in U.S. Pat. No. 4,725,892 to Suzuki et al. In fact, if applied to optical character recognition, such an approach would create other problems. Specifically, the use of the rectangular sampling window can create false character strokes in the reduced image.
In summary, character normalization involving a size reduction and aspect ratio change has created one of two problems. If each pixel in the new (reduced) image is taken only from the pixel in the corresponding position in the old image, then the remaining pixels in the old image are discarded and their character stroke information is irretrievably lost. On the other hand, if each pixel in the new (reduced) image is taken from all of the pixels lying in a neighborhood surrounding the corresponding location in the old image, then false character strokes may be introduced into the new image.
Accordingly, it is an object of the present invention to perform character normalization without discarding a significant amount of character stroke information and without introducing false character strokes into the normalized character image.