1. Field of the Invention
The present invention relates to the field of image processing. More particularly, this invention relates to the field of image processing for a digital copier.
2. Related Art
The copying of hardcopy pages using digital scan and print devices typically produces undesirable image artifacts. These artifacts result from limitations in spatial resolution and intensity modulation as well as from scan and print engine fidelity issues. They include moiré patterns, poor edge definition, limited dynamic range and visible color fringing. It is possible to minimize and sometimes eliminate these undesirable artifacts by processing the scanned digital image data before printing it. This processing may include smoothing prescreened image data, sharpening edges in text and line art, sharpening photographic data, and removing unwanted background data.
It is highly desirable to identify the contents of a scanned-in image. This is because the choice of appropriate image processing algorithms depends on the image contents. For instance, processing algorithms used for image sharpening will enhance edges in text and line art within the image but may emphasize moir é of halftone data within the image. Therefore, it is necessary to identify image contents by segmenting or classifying the images into homogeneous regions (i.e. regions of a single type of data or classification such as, unwanted background data, text, graphics, and photographic image data) such that the appropriate image processing can be applied to each region within the image.
When dividing the image into homogenous regions, the explicit segmentation or classification process needs to be accurate in order to drive the image operations and ensure accuracy. However, such processes are inherently error prone because of slight variations in image data constituting a given region type. Such errors will result in significant image artifacts due to the wrong type of image processing being applied to conflicting or consistent data types within a single homogenous region.
There have been several approaches taken to solving the problem of region identification, also known as classification, discrimination or segmentation, to distinguish between textual image data, halftone image data, and photographic image date. The first type of approach is direct pattern matching or binary pattern comparison. This process compares cells of the data to known pattern cells constituting the different classifications and then assigns or segments the cells of data into each classification based upon the comparison results. This type of approach is referred to in U.S. Pat. No. 5,771,107 and U.S. Pat. No. 4,984,283. One problem with this type of approach is that the discrimination or segmentation accuracy is dependent upon the size of the data cells and pattern cells which are compared. The smaller cell, the more probability errors may occur. The larger the cell, the more accurate the comparison but even for a 3×3 cell, the computation is still expensive.
The second type of approach, called fuzzy logic image classification, has probabilistic assessments and multi-category classifications. It was attempted to avoid the limitations of explicit classification and the resulting possibility of error by employing non-explicit methods. This kind of method drives image processing algorithms that reflect the uncertainties of the probabilistic analysis. The resulting image artifacts may still be objectionable however. This is because any classification including a non-exclusive probabilistic one is subject to error.
The third type of approach is to detect halftone area by identifying the frequency and angle of screening at which halftones occur. This type of approach is set forth in U.S. Pat. No. 5,384,648. However, this approach only works on images that were originally printed using a one angle Cluster Dot Screening method. Since there are many alternative image output screening techniques, such as multiple angle cluster dot screening, multi-bit screening, or stochastic screening, this approach is not ideal.
The fourth type of approach is using an auto-correlation algorithm to detect halftone image area. However, one disadvantage to this approach is that it cannot distinguish text embedded within halftone image area from the halftone data.
The fifth type of approach is based on edge detecting. An edge detector, usually a high pass filter, such as a Sobel edge detector, is used to detect edges of text and line. A problem that often occurs when this type of approach is used is the erratic discrimination between small Roman letters or small Japanese Kanji and halftone image.
It is necessary to detect and then remove unwanted background data in color copy processing, for example, bleed-through data combined with light-colored background, and white background data. Meanwhile, the highlight color data should be kept as much as possible. Conventionally, white background level was detected by sampling data and calculating a histogram based on scanner RGB data. Bleed-through data combined with light-colored background was still often observed after white background removal.
After the contents are identified, proper image processing methods are applied to these contents. That is, sharpening for text and black text enhancement, descreening for halftone image, and sharpening for photograph image. Black text enhancement is very important because most text in original images to be copied is black. Conventionally, after text edges were detected, text data was passed through under color removal to make text edges neutral. Since many text characters have color fringes beyond the edge after they are scanned, those fringes remain and degrade text quality.