1. Technical Field
The invention is related to apparatus and processes for segmenting characters in a bit map image of a scanned document in an optical character recognition preprocessor.
2. Background Art
Character segmentation is a fundamental and necessary processing stage of any optical character recognition (OCR) system. This task is simple if the text lines are aligned horizontally along the rows of pixels in the image and separated by white spaces between the lines. In this case it is straightforward to segment the individual characters in the text line by first finding the vertical bounds of the box enclosing the line and then segmenting each character by scanning for white spaces within the boxed line. However, if the lines of text are skewed, the task becomes difficult because the horizontal white spaces do not exist and it may not be possible to draw a single bounding box for the text line. Consequently, one has to resort to more sophisticated techniques for segmenting the character components. In the prior art, such techniques require either computing the skew angle of the document and then deskewing the document image, or performing complex operations on the bit-map image of the document. In either case, the character segmenting operation is complex and time consuming, mainly because it requires accessing large amounts of data from the bit map image of the document, a significant disadvantage.
One example of character segmentation in which the skew angle of the document is computed is U.S. Pat. No. 4,558,461 to Schlang. The estimated skew angle is used to set a statistical bound on the text lines. The actual text data is then rotated to obtain an unskewed version of the text line. Such a technique does not segment individual character components and requires extensive computations and operations in the bit map image for skew detection and de-rotation of text lines. One disadvantage of having to perform extensive operations in the bit map image data is that such operations are time-consuming, due to the large amount of bit map image data that must be located and fetched from memory.
Other examples of the foregoing type of technique in which the document skew must be computed or estimated are U.S. Pat. No. 4,926,490 to Mano, U.S. Pat. No. 4,866,784 to Barski and a publication by Kim, "Baseline Drift Correction of Handwritten Text," IBM Technical Disclosure Bulletin, Volume 25, No. 10. pp. 5111-5114.
One example of character segmentation requiring extensive arithmetic operations on the bit-map image of the skewed document is U.S. Pat. No. 4,776,024 to Katoh et al. In this example, the bits are combined together column-by-column as well as row-by-row in successive OR operations to determine a sum and a frequency of "on" bits in each column and in each row. Such operations are time-consuming because of the large amounts of data which must be fetched from the bit-map image and because of the large number of arithmetic operations that must be performed.
Before characters can be segmented individually, each of the character rows must first be segmented from one another. This is typically accomplished by inspecting the horizontal projection of the bit map image of the document for empty spaces between rows. The problem with documents that are severely skewed is that adjacent character rows may overlap in the horizontal projection of the document bit map image. One technique for overcoming this problem is to divide the document into plural vertical blocks and segment the character lines in each of the vertical blocks. This latter technique is disclosed in U.S. Pat. No. 4,776,024 to Katoh et al. (referred to hereinabove) and in Japanese Patent Application No. 56-204636 by M. Maeda published June 25, 1983. The Japanese Patent Application by Maeda discloses a method for segmenting character rows from one another using this technique. Maeda teaches that a complete character row within one vertical block is joined together with the corresponding character row in the adjacent or next vertical block by looking for a character row in the next block which vertically overlaps the character row in the one vertical block. Once the entire character row has been segmented across all vertical blocks, its location or boundary map is then output to a character segmentation process.
The disadvantage of the technique disclosed in the Japanese Patent Application by Maeda is that the character segmentation process must begin anew with a bit map image of the complete character row and perform all of the steps necessary to segment individual characters. No use is made of the information produced during the horizontal projection of character rows within individual vertical blocks of document image. Thus, the problem remains that the segmentation of individual characters requires large amounts of data from a bit map image and time-consuming operations to be performed on such data.