1. Technical Field
A portion of the disclosure of this patent documents contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark office patent file or records, but otherwise reserves all copyright rights whatsoever.
The invention is related to optical character recognition systems and in particular to optical character recognition pre-processors for separating the images of touching characters in a document image.
2. Background Art
In optical character recognition systems, a document is scanned to create a digital image. The digital document image is then processed so as to separate each character in the document and to recognize each character as a particular symbol. Typically, the characters in the document image are separated into individual character images by searching for empty columns between adjacent characters. However, if a pair of adjacent characters is kerned (so that the characters are overlying or partially wrapped around one another), there are no empty columns between the characters. This problem is solved by using connected component analysis to discern and separate the individual unconnected objects comprising the two characters. A more severe problem arises, however, when two adjacent characters in the document actually touch one another, due to a printing error or poor document reproduction, for example. In such a case, the objects comprising the two adjacent characters are actually connected together, thus forming a single object which cannot be readily separated by typical segmentation techniques such as connected component analysis. Failure to separate adjacent characters in the document prevents the optical character recognition system from recognizing the characters. Thus, it is imperative that a preprocessor be provided in an optical character recognition system to separate touching characters.
Various pre-processing techniques for optical character recognition systems and other types of systems are known. For example, U.S. Pat. No. 4,769,849 uses the outer contour features of each segment in the document image to identify all the pixels in a given object but not to separate touching characters in the same object. U.S. Pat. No. 4,764,971 uses image features such as variance or contrast to identify and segment different objects in an image, but does not teach separating touching characters in the same object. U.S. Pat. No. 4,731,857 discloses a method for segmenting characters in a document image, but not using their contour features.
In summary, there is a need for a reliable, fast and simple pre-processor for separating touching characters in a document image for an optical character recognition system. Accordingly, it is an object of the invention to provide a simple, reliable and fast touching character separating pre-processor in an optical character recognition system.
It is a further object of the invention to provide a method for separating touching characters in a document image using simple contour-following steps to detect and separate touching characters having closed inner contours.
It is a yet further object of the invention to provide a method for detecting touching characters in a document image using simple line-intersection steps for characters which do not have closed inner contours.