Document imaging technology is used in a growing number of business and other applications such as facsimile or "fax" machine transmissions, optical character recognition (OCR), the digitizing of photographs and artwork, and photocopy machines. In many of these applications, best results are usually obtained if document contents are properly aligned with an optical sensor used to generate an image of the document and if the size and/or shape of the document can be established.
If a document page is misaligned with respect to the optical sensor, the resultant image is similarly skewed. Because the contents of a document page are usually aligned with the page itself, a skewed page usually results in a misalignment with the optical sensor. Misalignment can reduce the amount of data compression achievable by fax machines and can increase the error rate of OCR processes. As a result, the ease with which such systems may be used is impaired because operators must take care to ensure that the medium is reasonably well aligned with the optical sensor.
Skew compensation in optical systems such as those disclosed in U.S. Pat. Nos. 5,027,227 and 5,093,653 are unsatisfactory because they require operator input to establish the "skew angle" or the amount of image rotation required to compensate for skew. A system disclosed in U.S. Pat. No. 4,953,230 does not require operator input, but it relies upon the existence of text or other marks on the page to establish the orientation of the page. Other disadvantages of these systems include requiring large amounts of memory to store the image while the skew angle is established and imposing a considerable delay after scanning before skew is compensated. Furthermore, the skew compensation techniques disclosed in these patents severely distort the image unless the skew angle is small.
In many applications, document images are either transmitted immediately or stored for later use. Transmission-channel bandwidth and storage capacity are required to convey portions of the scanned image outside the edges of the scanned document unless the size/shape of the document can be established. This bandwidth or storage capacity is essentially wasted because this portion of the image does not convey useful information about the contents of the document. It is, therefore, desirable for a document imaging system to establish the size/shape of pages in a document so that the required bandwidth required to transmit, or the storage capacity required to store, a document image is minimized.
A method and a device are needed for automatic skew compensation and for automatic size and/or shape detection.