Scanners and copiers are well-known office machines that provide valuable office functions both in the workplace and at home. One important component of these scanners and copiers is the image processing that automatically filters noise (unwanted information) from the scanned document.
The primary object of background removal is the removal of noise from a scanned document. First, there is the problem of background noise. Second, there is the problem of bleed through that stems from very thin originals.
One example of noise in the document is commonly referred to as “bleed through” and typically occurs when the document to be scanned is a very thin piece of paper (e.g., a page from a magazine etc.). The words or pictures from the backside of the document being scanned bleed through and are rendered as part of the current side's pictures and words. As can be appreciated, this “bleed through” severely degrades the quality of the copy or scanned document.
Another example of noise in the original document is a coffee stain on a white piece of paper. The stained area will appear darker that the remaining background of the document. If this stain is not detected and removed, the resulting scanned image will appear to have a dark spot where the stain is. Another example of what might erroneously be characterized as noise is a color background (e.g., a document printed on green piece of paper).
With the advent of color scanners and color copiers, this problem is aggravated since the image processing must be able to distinguish the color background that should be removed and color graphics, pictures, etc. that should be left alone.
It would be desirable to have image processing software to automatically remove noise (e.g., background or bleed through) without erroneously deleting important content. Unfortunately, as described in greater detail hereinafter, the prior art approaches only offer tolerable solutions that suffer from various disadvantages.
There are three main prior art approaches to background removal or noise removal. Each of these approaches is briefly described hereinafter and their respective shortcomings are set forth.
U.S. Pat. No. 5,956,468 entitled, “Document Segmentation System”, H. Ancin, U.S. Pat. No. 5,956,468, September 1999, uses a window-based algorithm for background removal and text enhancement. Unfortunately, this technique is a two-pass approach that requires a lower resolution version of the entire document (commonly referred to as a “pre-scan”) to determine the enhancement. With the advent of scanners and copiers that employ an automatic document feeder, requiring a user to feed a document through the scanner and copier twice is awkward at best and may not be readily accepted by users. In addition, additional memory is needed for such an approach.
U.S. Pat. No. 5,157,740 entitled, “Method for Background Suppression in an Image Data Processing System”, by R. Klein, K. A. Wilds, M. Higgins-Luthman, and D. C. Williams, October 1992, and U.S. Pat. No. 5,282,061 entitled, “Programmable Apparatus for Determining Document Background Level”, B. L. Farrell, January 1994, also use window-based algorithms, but they explicitly classify pixels as “background” or “signal” pixels based on neighborhood information. A major disadvantage of these approaches is that a misclassification in such a system often leads to objectionable artifacts, such as areas in the scanned document of noise (ones and zeros where the image processing classifies some pixels as background that are subsequently set to white and other pixels in the region as important information that is not affected or modified.
U.S. Pat. No. 5,761,339 entitled, “Method and Recording Medium for Separating and Composing Background and Character Image Data”, N. Ikeshoji, T. Yamamoto, T. Kamiuchi, N. Hamada, K. Honda, and H. Yamakawa, June 1998, uses a nonlinear filter to estimate the background value at each pixel and then subtracts this background image from the original image to push the background to white and remove stains in the image. Although this approach provides tolerable results for text and lines, the approach could produce artifacts in regions of scanned photos, halftones, or solid fills. For example, a bright red object would have a red outline, but would have the center portion set to the background color (e.g., white) for a disturbing effect. Similarly, photographs with a person wearing a bright blue shirt would be rendered as a white shirt with a blue outline.
Based on the foregoing, there remains a need for a method and system for automatically removing background that employs soft thresholding, that avoids the artifacts due to hard thresholding, that accurately removes noise from scanned photographs, halftones, and solid fills without removing important content, and that overcomes the disadvantages set forth previously.