1. Field of Related Art
The present disclosure relates to scanning documents, and more particularly, to a method and system for detecting document skew using connected components analysis.
2. Background of the Related Art
The use of digital input scanners, which can successively scan a set of sheets and record the images thereon as digital data, is becoming common in the office context, such as in digital copiers and electronic archiving. Due to the high frequency in the use of such machines and the large range of mechanical parts involved in the process, there is a high probability that skewing of a document image may result when a document is scanned. The skew may be caused by a number of factors including: careless manual placement of documents within the scanner, how the paper was placed in the hopper or how the transport rollers grab the leading edge of the paper to move it. Also, when documents are transported for scanning using a document feeder, they may become skewed when traveling through the paper path. Thus, there is a need in the art to detect the skew angle of the image content on a scanned page in a fast, accurate, and most efficient manner. Although there is prior art that detects skew of a scanned document, none of the patents in this area use connected components analysis.
There is prior art that detects skew and crop statistics by determining the presence of unwanted extraneous and background information in a scanned document image. The process includes first determining the edge by comparing a scan line of pixels with a predetermined scan line of background pixels or comparing a neighborhood around a scan line with predetermined background pixels. Then, using the slope of the detected edge in the scan image, the skew angle is computed.
There is also prior art that detects skew of a document image by segmenting the input image into regions, each having a predetermined width, and then detecting lines containing black pixels from each of these regions. Using the region where the detected lines contain black pixels and following one another in a consecutive manner, a partial image is extracted. Then, the skew angle of the partial images is calculated and used to determine the skew angle of the entire document image.
Consequentially, prior art systems lack the capability to most effectively and efficiently straighten scanned documents, which is a critical pre-processing step for improving the compression rate, the visualization aspect, the line removal, and the accuracy using Optical Character Recognition (“OCR”) during indexing. The present disclosure uses connected components analysis to overcome the drawbacks of prior art methods and systems.