A fundamental and pervasive problem arises in the art of automatic document analysis, character/object recognition and related fields. The problem is the recognition and correction for skew in printed documents during the automated process. By skew, it is meant the angle between the dominant orientation of lines of characters or other textual objects of a document and a reference line observed by a reader, or the like, as representing zero angular error. Exemplary of the actions that are performed during document analysis is the segmentation of images of printed documents into blocks and lines of objects, such as characters. Known segmentation methods include those called top-down methods and bottom-up methods. Top-down methods characteristically operate by estimating some global properties of an image and by using the properties to guide segmentation into local regions whose local properties are estimated in turn. Bottom-up methods of segmentation characteristically operate by first clustering characters into lines, then lines into paragraphs, and so on. Unfortunately, the top-down methods tend to be excessively sensitive to non-zero skew.
A representative bottom-up method is described by Nagy, G. et. al. in an article entitled "Document Analysis with an Expert System," Proceedings, Pattern Recognition in Practice, Amsterdam, 1985. This bottom-up method relies on good skew alignment, with skew angle restricted to no more than a few degrees. However, while bottom-up methods are less sensitive to skew than top-down methods, the bottom-up methods are generally slower and suffer from other problems unrelated to skew sensitivity as well.
Hashizume, Yeh, & Rosenfeld, in an article entitled "A Method of Detecting the Orientation of Aligned Components", Pattern Recognition Letters, 1986, pp. 125-132, describe a skew determining method based on the premise that objects, e.g. characters, are often closer to one another along a dominant line orientation than in other directions. This technique computes the nearest neighbor of each object and connects each neighboring pair with a straight line segment. A PG,3 histogram of the orientations of these line segments is computed. The histogram may have a strongly-marked peak at the dominant skew angle. The skew angle is computed as an average of values near the peak. Among the known examples reported using this technique, the average error was 1.5 degrees and the worst 4.1 degrees.
W. Postl describe experiments with two methods of skew determining in a paper "Detection of Linear Oblique Structures and Skew Scan in Digitized Documents", Proceedings, Eighth International Conference on Pattern Recognition, Paris, October 1986, pp. 687-689. The first method applies the discrete two-dimensional Fourier transform to an image plane and examines a half plane of the power spectrum coefficients. The technique assumes an orientation angle and measures the energy in spatial frequencies at that orientation angle. The accuracy obtained with this method is not known. The second method similarly hunts for the maximum of a measure over a range of angles. The integral density of points is computed along assumed scan angles. For each pair of neighboring scan lines, the difference of their densities is computed. Finally, the sum of squares of these differences is computed.
Rastogi & Srihari describe a method using a Hough transform in an article, "Recognizing Textual Blocks Using the Hough Transform", Department Computer Science, University of Buffalo (SUNY), 1986. For each angle in a discrete representation of Hough space, the number of large "low-high-low transitions" is counted, and the maximum count is interpreted as identifying the dominant skew. In the five examples shown in the paper, skew angle was coarsely quantized in increments of 15 degrees.
While the above methods operate satisfactorily in some contexts, they are both slow and complex, and give coarse estimates of skew.