In many image processing applications, it is desirable to determine and correct skew of a document image. For example, many text recognition systems, such as optical character recognition (OCR) systems, fail if presented with text oriented with a skew of more than a few degrees, not to mention if the text is oriented sideways or upside-down. In addition, it is easier to identify text lines and text columns if the image skew is known or the image is deskewed.
A variety of methods for determining image skew using an iterative estimation approach based upon the method disclosed by Baird in U.S. Pat. No. 5,001,766 have been proposed. One such approach involves using bounding boxes of connected components to estimate image skew. The coordinates of a token point, on the bottom center of the bounding box, are selected, and a function Stokens of skew angle is computed from these coordinates. Specifically, the function Stokens(2) is the sum of squares of the number of such points computed along a set of lines with the angle 2 to the raster direction. A vertical shear is simulated on the set of points and the sums over the points with the same y-coordinate are determined. Aside from a constant (independent of 2), the function Stokens is the variance of the number of tokens on a line, as a function of angle. This variance is a maximum in the direction where the tokens for each text line fall near the same line.
Another method for determining skew traverses straight lines of the image at a set of angles relative to the raster direction. A function Sδ(θ) is computed that has a maximum value when the scan direction θ is along the text lines. Unlike the approach described above, which computes tokens from connected components, this method uses every pixel in the image. The function Sδ(θ) is similar to the function Stokens(θ) in the sense that an angle θ is chosen and pixel sums are found along lines in the image at this angle. However, instead of squaring the sum of tokens, the second method squares the difference between sums of ON pixels on adjacent lines, and the function Sδ(θ) is found by summing over all lines. The function Sδ(θ) is, aside from a constant, the variance in the difference between pixel sums on adjacent lines at the angle θ.
The deskew methods described above, along with other deskew methods employing an iterative estimation approach based on the teachings of Baird, require a local copy of the image to be stored. Such requirement prevents pipeline processing; while the iterative nature of the methods eliminates the possibility of parallel processing. Therefore, the speed at which an image may be deskewed is necessarily limited.
The teachings disclosed herein propose a method and apparatus for determining and correcting image skew. In particular, there is taught a parallel, non-iterative, memory efficient method of determining image skew. In accordance with an embodiment disclosed herein there is provided a method of determining image skew including scanning a document to produce scanned image data, the scanned image data comprising a plurality of scanlines with each scanline comprising a plurality of pixels; generating a fast scan second order moment data set; generating a slow scan second order moment data set; and determining a document skew angle from the fast scan and slow scan second order moment data sets; wherein the step of generating the slow scan second order moment data set includes receiving a current scanline of image data, updating columns sums for a set of rotation angles using scanlines within a buffer comprising a band of scanlines, the band having a predetermined number B of scanlines, and updating the buffer with the current scanline.
In accordance with another embodiment disclosed herein there is provided a method of determining image skew including scanning a document to produce scanned image data comprising a plurality of scanlines, each scanline having a plurality of pixels; generating a first set of second order moments, the first set of second order moments being based on row sums; generating a second set of second order moments, the second set of second order moments being based on column sums; and determining a document skew angle from the first and second sets of second order moments; wherein the step of generating the first set of second order moments includes receiving a first scanline, projecting a plurality of pixels within the first scanline to a first rotation angle, adding a first subset of the plurality of projected pixels to a first memory location, adding a second subset of the plurality of projected pixels to a second memory location, adding a third subset of the plurality of projected pixels to a third memory location, and adding the square of the pixel sum in the first memory location to a moment accumulator.
In accordance with another embodiment disclosed herein there is provided a method of processing image data to determine image skew, comprising: receiving scanned image data, the scanned image data comprising a plurality of scanlines with each scanline comprising a plurality of pixels; generating a fast scan second order moment data set; generating a slow scan second order moment data set; and determining a document skew angle from the fast scan and slow scan second order moment data sets; wherein the step of generating the fast scan second order moment data set includes projecting a plurality of pixels within a scanline to at least a first rotation angle wherein the plurality pixels project onto M rows; updating M memory locations, wherein each one of the M memory locations corresponds to one of the M rows and wherein each memory location is updated using pixels projecting onto the corresponding row whereby one of the memory locations contains a completed row sum and M−1 of the memory locations contain partial row sums; adding the square of the completed row sum to a moment accumulator; and repeating the projecting, updating and adding steps for each scanline within a plurality of scanlines whereby a memory location having a completed row sum is reused to accumulate subsequent row sum so that only M memory locations are required to accumulate row sums.