The present invention is a method and a system that relates generally to the field of automatic document scanners/readers. More particularly, the present invention is directed to a method and a system for extracting relevant character information from fields of scanning data which contain unwanted background information and noise that is picked up in the scanning process.
When a document such as a check is scanned for relevant information, the scanner is unable to determine what changes in reflectivity are caused by character information, noise, and/or scenic background. Various types of systems using noise filters and extraction algorithms attempt to separate the character information from such unwanted signals.
Prior art systems utilize those features associated with a gray level difference between small square shaped areas of a document. Each square area is called a "picture element" or a "pixel". Gray level differences are looked at in different combinations but generally the gray level of a central pixel is compared against the gray level of its local neighbors. U.S. Pat. No. 4,510,618 entitled, Noise Cleaner for Binary Images by E. Ataman et al., is such a system. Systems of that type are provided with multiple level threshold detectors determining the gray value of an incoming signal. The threshold levels are set against an industry standard. The industry standard, sometimes referred to as global dynamic thresholding curves, was derived by sampling a group of training checks and statistically arriving at an average set of curves which can be used for threshold setting.
When a check having both character images and background images is scanned, the contrast difference between the average gray level of a character pixel and the average gray level of the local pixels, or locally scattered average gray levels, is significantly larger than when the gray level of a background pixel is compared with the average gray level of other local background pixels. As a truism, the darker the pixel, the higher the probability that it belongs to the character pixel population. As a result, a dynamic thresholding which uses a lower thresholding value for darker pixels and a higher thresholding value for lighter pixels will provide an improved character extraction system.
The industries' dynamic thresholding curves, being based on the global statistics of a training set of check samples is not at all optimum for any individual check, but rather is statistically optimum for an ensemble of checks. Another problem associated with the global dynamic thresholding schemes is that it is assumed that the characteristics of future checks will be the same as the characteristics of the present training set. It is not obvious that this would be true. If not, constant up-date (or fine tuning) of the dynamic thresholding curves will be required and systems based on those curves would have to be adjusted accordingly.
With existing systems, the amount of unwanted background information remaining in the images processed is relatively large and tends to hamper processing of the data. The elimination of more background information, without degradation of the desired character information, would definitely provide an improved system.