The invention relates to systems and methods for removing extraneous image material of a document being digitally scanned to provide an enhanced copy including all of the desired data and image information of the originally scanned document but omitting as much extraneous image material as possible without affecting the desired data and image information.
Documents frequently are scanned from a paper copy, or a “copy-of-a-copy”, or the like into a digital representation by means of a digital scanner. The digital representation thus created usually is initially in the form of a sequence of pages, each represented as a sequence of pixels or “run-lengths”. The digital representation produced by scanning of the document often contains undesirable image elements referred to as “noise” in addition to the “essential data” or “desired data” making up the essential meaning of the images of the document. Furthermore, the essential data may be geometrically distorted from its ideal form in either a linear or non-linear manner. The amount and type of noise and/or distortion depends upon both the particular digital scanning process used and the quality of the scanned document. It is important to remove as much noise and distortion from the scanned representation as possible, both to improve the esthetic appearance of “enhanced” or “cleaned” copies of the document reconstructed from the digital representation, and to make text and other images in the “reconstructed” document more legible. This is especially important if the scanned document is to be processed by optical character recognition (OCR) software. Furthermore, for some kinds of encoding, such as run-length encoding, a digital representation including noise requires more storage space in which to store the digital representation. Also, more time is required for digital processing of image data including noise.
The removal of noise and correction of distortion is referred to herein as “enhancement” or “clean-up”. Usually, clean-up of digitally scanned images from a document requires substantial human intervention. For example, a human operator may be required to delineate regions on the document in which speckles are to be removed and/or to specify a maximum speckle size for removal in each region. The operator also may be required to specify the orientation of the document so that it can be rotated to correct for a small amount of skewing introduced during the scanning process. Such manual interventions are time-consuming, and need to repeated for each document or page to be “cleaned up.
Thus, there is an unmet need for an automatic system and technique for fast, inexpensive “clean-up” of noise and distortion in images digitally scanned from an original document or the like which ensures that no desired or essential image data will be lost.