1. Field of the Invention
The present invention relates generally to document information processing, and more particularly to a preprocessing technique for electronic digital equipments which include a document read device and which are preferably applied to a document image processing apparatus such as a pattern recognition apparatus, an automatic syntactic analysis apparatus, a document information filing apparatus or the like. This invention relates more specifically to detection of a skew angle of document image.
2. Description of the Related Art
With the increasing development in high performance and reliability of digital computer systems, an electronic apparatus for processing printed documents is becoming important more and more. Such electronic document processing apparatus typically includes a pattern recognition apparatus, a syntactic analysis apparatus, a document information filing apparatus, and so forth, which employ, at their input stage, a document reader device for optically scanning the printed face of an input document to generate a corresponding electric information signal representing the contents of the document. The document reader may include an optical character reader or OCR, which makes an operator or user free from cumbersome works of manually inputting by using a conventional input tool, such as a keyboard unit, the text information of an input document into electronic files capable of being accessed by a computer. This may improve the efficiency of document input works in offices.
While the document reader devices are growing in performance with the recent development in the electronic technology, it cannot be said that the reading accuracy reaches a perfectly satisfactory level as required. This can be said because there are no guarantees for an input printed document to be put correctly on a scanning base plate without the occurrence of any skew. If the printed document is skewed, the character-read results are naturally poor in reliability.
An automatic document skew detection apparatus automatically detects the occurrence of an unintentional skew of the image of a printed document; the apparatus is important to facilitate the recognition of the contents of the document. Until today, several techniques have been proposed for document skew detection. By way of an example, there is known in the art a technique for searching for a ruled line (such as a cutoff rule, a hairline rule, etc.) in the image of an input document, and for, if it is found, determining the document skew angle with the detected line being as a reference direction. Generally, the ruled line is drawn in either the horizontal direction or the vertical direction of the document; if such line is detected correctly, the document skew angle will be detected accurately. However, this "ruled line" based document skew detection method is not almighty; obviously, if the input document contains no ruled lines, the method can no longer detect the document skew.
Another document skew detection method is disclosed in "A Segmentation Method for Document Images without the Knowledge of Document Formats)," by Teruo Akiyama et al., the Transactions of the Institute of Electronics, Information and Communication Engineers, paper D-15, Vol. J-66-D, No. 1, at pp. 111 to 118 (1983). With this prior art, the image of an input document is binary-coded such that its contents are represented by a set of a number of black picture elements (pixels) and white pixels. These black and white pixels are scanned along selected angles of projection to search for a specific direction in which the black pixels exhibit the sharpest peak in the peripheral distribution thereof. The angle of this direction is determined as a document skew angle with respect to a predetermined reference direction, which may correspond to either one of the horizontal direction and the vertical direction. With such a scheme, however, there is a drawback of an undesirable increase in the total processing time required to fully execute the document skew detecting operation. In addition, the skew detection results are not satisfactory. This will become more serious if the input document contains not only pure text lines but also non-text portions (such as line drawing, graphics or photographic image, etc.).
Still another document skew detection method is disclosed in "A Document Skew Detection Method using Run-Length Encoding and the Hough Transform," by Stuart C. Hinds et al., IEEE, Proc. of 10th International Conference on Pattern Recognition, at pp. 464-468 (1990), wherein a document skew detection is attained by applying the Hough transform to either the horizontal burst image or the vertical burst image. This document skew detecting method may remain effective for some printed documents that partially contain line drawings, graphics, or photographic images. However, the method suffers from the impossibility of attaining an accurate skew detection when the text portion is less in area than the non-text region in the input document image. The method also suffers from the fact that entire processing system is complicated, and that it necessitates the employment of a large-scale computer equipment in order to attain a desired detection speed and accuracy.