1. Field of the Invention
The present invention relates to a document edge detection apparatus, particularly to a document edge detection apparatus that accurately detects the edges of a document by calculating straight lines representing the edge or edges of a document from document image signals input from a scanner or the like.
2. Description of the Prior Art
In many image processing systems, detecting the edges of scanned document images is an important preprocessing function. To enable optical character recognition to be applied to forms on which characters are written at prescribed locations relative to the form's edge, it is necessary to be able to detect the edge of the form so as to be able to use it as a reference for reading specific character locations. In the course of the edge detection process it is also possible to detect document skew, which is information that can be used as supplementary data in the subsequent character recognition stage. Using the edge detection process to detect the document skew angle is also useful when it is necessary to apply skew correction to the image of a document that has been scanned on the skew. In addition, using a plain paper printer to print out unmodified images of documents that have a dark background involves a heavy consumption of toner, so edge detection is also used in such cases when it is desired to eliminate the dark background tones.
Because edge detection of document images is thus utilized in a variety of ways, various edge detection methods have been proposed. One such conventional method of detecting the edges of documents involves using Huffman conversion to detect straight line components representing document edges. In another method, a document image is scanned both ways in the primary scanning direction, image continuities from white to black or black to white (as determined by the number of continuous black or white pixels exceeding a set value) are extracted as possible edge points, from which points that expand the image region are selected as edge point coordinates and, using the method of least squares, these coordinates are used to obtain straight line equations that represent edges.
However, each of these methods has drawbacks. In the case of the first method, the problem is that it involves processing that is complex as well as time-consuming, while a further problem is that it can be difficult to detect an edge if the document contains image components that are located parallel to the edge. In the case of the second method, points on relatively large damaged portions, creases or smears included in the document may be seen as edge pixels, so when a straight line is approximated using calculations based on edge point coordinates that include such inappropriate points, the straight line thus obtained will not correctly represent the document edge.