The invention relates generally to the electronic processing of images and is more particularly directed to the processing of document images that may include handwritten or printed text overlying background graphics, such as is commonly present in bank checks presented for payment.
In many document-processing applications the images of the documents to be processed are electronically captured and presented to operators at workstations for data entry or other processing, such as optical character recognition, directly from the electronic images. The images may be archived on magnetic or optical media and subsequently retrieved and displayed or printed when needed. This is the case for example with bank checks presented for payment. Checks are processed in high volumes by capturing the images of the front and back sides of the checks on high-speed document transports. The images are then displayed at workstations where-operators may enter the dollar amounts, verify signatures, reconcile inconsistencies and undertake other processing steps. Many financial institutions will then provide their account holders printouts showing small-scale black and white printed images of recently processed checks with the monthly account statements.
A problem arises in working with such digital images of checks or other documents. Bank checks frequently include background pictures printed on the checks for decorative purposes or background patterns printed for security purposes. The various data fields to be filled in with substantive information, such as payee, date, dollar amount, and authorizing signature of the payor, generally overlie the background. In the digitally captured image of such checks the substantive data fields are sometimes difficult to read because of interference from the digitally captured background image or pattern. Printouts of such check images may be even harder to read because of their reduced size.
Early systems for image processing of bank checks tried to eliminate the background picture or pattern altogether from the captured image of the check. Such early systems typically employed a thresholding technique to eliminate the background. Such techniques have not been entirely successful. They tend to leave behind residual black marks left over from the background image that interfere with the substantive information on the check and in some instances may even degrade the handwritten or printed textual matter on the check making it more difficult to read. In addition, it is sometimes desirable to retain some or all of the background picture, for example, to print images of the checks along with a bank statement. The problem here is that an insensitive threshold may avoid most, although generally not all, of the background but may miss some of the low-contrast text, whereas a more sensitive threshold may pick up most of the low-contrast text but more of the background, too.
Over the years various other approaches have been developed for handling background graphics in document images and either eliminating the background or reproducing it in a more readable fashion. Such other approaches may be seen for example in U.S. Pat. Nos. 4,853,970 and 5,600,732. See also the recent publication by S. Djeziri et al., entitled xe2x80x9cExtraction of Signatures from Check Background Based on a Filiformity Criterion,xe2x80x9d IEEE Transactions on Image Processing, Vol. 7, No. 10, October 1998, pp. 1425-1438, and references cited therein for general discussions of the field.
In particular, U.S. Pat. No. 4,853,970 discloses an approach in which the captured image of a document is first analyzed to find the edges of pictorial or text features present in the image. The edges separate other areas of light and dark over which the intensity varies more gradually, if at all. The image is then reconstructed by separately reconstructing the edges with an algorithm, referred to in U.S. Pat. No. 4,853,970 as a point algorithm or point operator, that is adapted to give good representation of the image where edges are located and reconstructing the expanses of gradual intensity variation with an algorithm, referred to in U.S. Pat. No. 4,853,970 as a level algorithm or level operator, that is appropriate for such gradual variations. For example, a thresholding algorithm with very insensitive threshold could be used for the second algorithm if it is desired to minimize the background or a digital half-toning algorithm could be used to give a good representation of pictorial graphics without compromising the textual matter, which is composed primarily of characters that have strong edges.
Notwithstanding the benefits of this method, it nevertheless represents a compromise in the clarity and readability of the original document.
The present invention provides a method for improved readability of digitally captured document images that include textual material (printed or handwritten) and background graphics. The method is especially suited for image processing of bank checks, remittance documents or other such financial documents that tend to have textual characters overlaid on a wide variety of decorative pictures or security patterns.
The method takes advantage of the fact that handwritten or printed characters are generally composed of comparatively thin features referred to as strokes, in which two more or less parallel edges are in close proximity to one another. An improved representation of such textual handwriting and printing is provided by first determining those pixels in the captured image that are part of a stroke. Pixels in the neighborhood of a stroke edge are assigned a black or white value according to a so-called point algorithm adapted to give a good representation of edges. Pixels not in the neighborhood of a stroke are assigned a black or white value according to a level algorithm, that is, according to an algorithm adapted to give a good representation of slowly varying or constant intensities. The level algorithm is used to assign pixel values even for pixels in the neighborhood of an isolated edge, that is, a non-stroke edge, forming a part of a background graphic. In this way, stroke images are preserved while isolated edges in the background are de-emphasized. The background graphic itself may be substantially eliminated if desired by, for example, applying a level thresholding algorithm with an insensitive threshold. The strokes making up the printed or handwritten text will be preserved because they are not subjected to the level algorithm, and the edges of shaded areas forming a part of the background graphic will not be singled out for special treatment and will generally be eliminated by the level algorithm. Of course, where the background graphic is itself in the form of line art composed primarily of strokes, the background graphic will not be eliminated by the level algorithm, but on the contrary the strokes of the line art will be preserved the same as the strokes of any overlaid textual matter. Even here, however, the method can lead to improved readability because preserving the individual strokes comprising the line art in turn preserves the integrity of the line art as a whole. The reader is then better able to differentiate the underlying line art from the textual matter by the substantive context of the line art.
To determine those edges that are paired edges forming part of a stroke, an edge operator is applied to construct an idealized edge in each neighborhood of rapid intensity variation. Each side of a stroke that is part of a textual character will generate its own edge image so that an edge operator applied to the pixels in the vicinity of a stroke will generate two closely spaced lines with only a thin area between them. Then a grow operation is applied to all edge images constructed by the edge operator so as to enlarge the edges into fatter, intermediate lines. For edge images deriving from the two neighboring edges of a stroke, the two intermediate lines so generated will generally merge with one another forming one fat line. Then a shrink operation is applied to the intermediate lines resulting from the grow operation. The shrink operation is of an effective magnitude that an isolated edge will not survive the shrink operation and will disappear whereas a fat intermediate line resulting from a stroke will remain, although it may now be reduced to a thin line. Then a grow operation is applied to bring the preserved intermediate line originated by the stroke back to a width approximating its original width. At this stage the edges that remain are to a great extent only those that derive from strokes. These edge pixels are then used to select either a point algorithm or level algorithm for binarizing the image data.
In some cases it is desirable not to completely suppress the background graphics, but rather to include the background graphics in the image of the document that is to be preserved. Here it has been found advantageous to add back into the document image a representation of the background graphics in which the overall contrast of the background graphics has been reduced. This may be achieved, for example, by scaling back the contrast of the captured digital image by a fixed percentage or by a variable amount prescribed by a formula, and then subjecting the scaled-back image data to a process such as a digital half-tone process providing a good binarized representation of the reduced-contrast image. This binarized representation is then mixed with the stroke-preserved image described above. The benefit of intermixing a reduced-contrast image with an image separately processed to enhance textual matter is not limited to the stroke-preservation method referenced above. An improvement in image quality and overall document readability may also be realized when other methods are used to reconstruct the textual matter.