It is known in the art to optically scan and recognize a page having text or pictorial images. The information from the scanned page is converted into electronic data as an array of pixels representing black or white points of each scanline in the paper. The data is stored and is aggregated to reproduce the scanned images. A problem with this technique is its inability to distinguish two or more columns in a page, i.e., the recognized lines will be interpreted as continuous and without the separation between the columns. If the document contains illustrations or graphics, interpretation will be erroneous if they are treated as text characters. Therefore, it is desirable to have a processing step to divide the scanned images into (1) areas corresponding to illustrations and (2) areas corresponding to text and to further subdivide the latter into columns. Then the recognition process will be applied to each column of text.
The process of partitioning images into columns of text is called page segmentation. One known page segmentation technique assumes that columns of text and illustrations form a right rectangle having one pair of sides vertical and the other horizontal. See, for example, U.S. Pat. No. 4,503,556 to Scherl, et al. and an article by H.S. Baird, S.E. Jones and S.J. Fortune "Image Segmentation by Shape-Directed Covers", published in the Proceedings 10th Int. Conf. on Pattern Recognition pp. 820-825, (June, 1990). The disclosure of each of these references is incorporated herein by reference for the purpose of providing an understanding of basic techniques employed in the general field of the invention.
In a commonly known technique in the art, the electronically scanned page is usually a matrix of 1's representing dark pixels and 0's representing light pixels. Generally, W is the width, H is the height of the matrix and m[i][j] represents the matrix element in the i.sup.th row and j.sup.th column. A projection profile along the horizontal is defined as ##EQU1## and a projection profile along the vertical is defined as ##EQU2##
Previous state of the art methodologies assume that gaps between blocks of texts can be found by searching for places with projections of zero. If the page is upright, then indeed there will be a count of zero within the white spaces between columns along the page P[j]. When the page is tilted, then there will be a certain nonzero count in the white spaces between the columns.
Thus, prior art techniques often fail when the page is tilted while being scanned or when the document contains text and illustrations not printed in a rectangular form.
For example, consider a hypothetical situation where the text area of a page is 6 inches in width and 7 inches in height. When scanned at 300 pixels per inch, a matrix results with W equal to 1800 pixels and H equal to 2100 pixels. Suppose also that each of the two text columns is 2.85 inches wide (855 pixels) and the gap in between them is 0.3 inches (90 pixels). for values of j between approximately 860 and 940. However, if the page is scanned with a tilt of 6 degrees, then the white spaces will shift from the range of 860 to 940 at the top row to the range from 1080 to 1160 towards the bottom row.
Since the sine of 6 degrees equals about 0.105, each vertical projection with a "j" value greater than 940 will have a, zero count for a length of approximately 0.3 inches divided by 0.105 or about 860 pixels. If the remaining 1240 pixels are mostly black, a count value as high as 1000 may not be unexpected.
In the example above, the vertical projection profile will be nonzero for "j" values between 860 and 940 if the tilt angle is greater than 2.5 degrees (tangent of 0.3 inches/7.0 inches).
Thus, it is desirable to improve the processing of scanning systems with techniques which are more tilt independent.