In applications of document analysis, printed pages of documents are scanned and represented by two dimensional images for further processing. One such application area is storing and retrieving of document pages by computer filing systems. Another major application area is optical character recognition (OCR) in which the characters in a text document are recognized by computer and the text is converted into a string of characters. In both these applications, the relative inclination angle of the page being scanned needs to be detected and accounted for. Uncompensated skew can cause serious performance deterioration in both mentioned application areas. In computerized filing systems, it makes the concatenation or superposition of two parts of the stored data a prohibitive task. In OCR, locating characters on different lines becomes very difficult if the line bases are not horizontal.
In a general document image analysis procedure, the first step is to segment the image into different regions including only text or graphics. In the subsequent stage of analyzing the text blocks, the skew angle has to be determined before proceeding to next steps.
Several methods have been developed for estimating the skew angle of text lines in a document page. In the early work, least squares minimization was used to fit a straight line to characters in a line of text. More recently, the Hough transform has been proposed to be used for detecting the skew angle and positions of the lines of text. The Hough transform is a special case of the Radon transform, the latter being essentially a mapping from the coordinate space to the space of parameters that define projections of the coordinate space. The Hough transform was originally developed for estimating the parameters of straight lines in an image, and was later extended to handle other shapes. In the line detection application of the Hough transform, each significant pixel in the image is mapped to a curve in the parameter space and elements of an accumulator array are incremented at locations through which the curve passes. The line parameters are .rho., the normal distance of the line from the origin, and .theta., the angle that the normal to the line makes with positive x axis. .rho. and .theta. define the parameter space and an accumulator array is constructed by bounding and discretizing them in their predicted possible ranges. Each pixel in the image is mapped by the Hough transform to the curve EQU .rho.=x cos .theta.+y sin .theta. (1)
in the .rho.-.theta. plane. This equation also represents the line in the x-y plane that has a distance .rho. to the origin and the normal to it makes an angle .theta. with the x axis. Therefore, all the points in the x-y plane located on the line .rho..sub.O =x cos .theta..sub.O +y sin .theta..sub.O are mapped to curves in the .rho.-.theta. plane that all pass through the point (.rho..sub.O,.theta..sub.O). After mapping all the points in the image, a search procedure is needed to be performed in the .rho.-.theta. plane to find peak locations, which are assumed to correspond to the line parameters. The Hough transform method is capable of handling a fairly high amount of noise contamination; however, it is computationally expensive and also requires a search stage. Moreover, the choice of the quantization step for the .rho. and .theta. axes is dependent on the unknown parameters and is not straightforward; to obtain an acceptable resolution in the estimates, the .rho. and .theta. axes need to be quantized finely, but reducing the quantization steps will broaden the peaks in the transform plane. This is because the points on a line are not exactly collinear in a discretized image, and hence the curves corresponding to a line do not exactly pass through the same point. Some procedures have been investigated to compensate for this difficulty in the quantization step of the Hough transform, but the computational complexity of the method inhibits its real-time implementation in most practical applications.
In the area of skew angle detection for text document analysis, several approaches have been proposed that are based on different modifications of the Hough transform. One approach is that of treating-the text as thick lines and choosing a coarse quantization for .rho. to overcome the broadening effect mentioned above. However, since the font sizes are not known a priori, the choice of an appropriate quantization step is made iteratively, resulting in a dramatic computation cost increase. Another approach adopts a variable step size for .rho. that is a function of .theta. in order to overcome the so-called aliasing problem caused by the nonuniform number of parallel line pixels among different angle bins. Then, after computing the transform, they consider the rate of change in accumulator elements along each angle and choose the angle with the highest overall gradient as the skew angle of the text. Although this method does not require knowledge of font size, the high computational load of the Hough transform still makes it is quite slow.
Another approach applies the Hough transform to the corners of the rectangles obtained by connecting near black pixels in the document, and then apply several sharpness criteria in the transform domain to locate the angle that best corresponds to the text skew. To reduce the computational cost of the Hough transform, one can first create a so-called burst image from the document image by accumulating the intensities of each set of pixels that are connected to each other vertically, and associating the resulting number with the pixel location at the bottom of the set; the intensities of the rest of the pixels are set to zero. Then the Hough transform is applied to the resulting burst image in order to find the skew angle. This run-length encoding procedure helps to reduce the number of nonzero pixels in the image; however, this procedure rapidly degrades as the skew angle of the text increases, since most characters will not contain vertically connected pixels when rotated.