1. Field of the Invention
The present invention relates to the field of character recognition systems, in particular to a method for identifying an angle of skew and correction thereof, used in connection with document segmentation.
2. Description of the Related Art
Optical character recognition provides for creating a text file on a computer system from a printed document page. The created text file may then be manipulated by a text editing or word processing application on the computer system. As a document page may be comprised of both text and pictures, or the text may be in columns, such as in a newspaper or magazine article, an important step prior to character recognition is document segmentation. Document segmentation is the identification of various text, image (picture) and line segment portions of the document image. As only the text portions of the document image can be converted into a text file, it is desirable to limit the character recognition to only areas of the document which have text, and to provide an order by which those portions of text are inserted into a text file. Ordering of text files is desirable in order to avoid the creation of text files that would not logically correspond to the original document. Such text files would be of diminished value.
A difficult problem confronted by all document segmentation systems is that of skew. Skew occurs whenever the representation of the document does not properly represent horizontal lines of text on the actual document page image. Skew must be corrected prior to performing character recognition of the document image. Skew correction generally requires the determination of a skew angle and modification of a document image representation based on the skew angle. With regards to skew angle determination, a first known method is based on the Hough Transform. In the Hough Transform, the bit mapped image of the document is transformed into a polar coordinate space. By identifying the maximum peak in the polar coordinate, the skewed angle is directly obtained from its polar angle. The Hough Transform method requires extensive computation time, and is found to not be sufficiently sensitive enough to the skew angle.
A second known method is described in the article entitled, "The Skew Angle of Printed Documents", H. S. Baird, Proceeding SPSE 40th Conference Symposium Hybrid Imaging Systems, Rochester, N.Y., May 1987, pgs. 21-24. In this second method, after two dimensional Fourier transform of the original document image, it is again projected to polar coordinates. The maximum of the projected values gives the angle of skew. This method has been found to provide high accuracy, up to 2 minutes of arc, but again requires extensive amounts of processing time.
It is an object of the present invention to provide a skew determination and correction apparatus and method which is accurate and makes efficient use of system resources.