The present invention relates generally to document image processing, and specifically to correction of distortions in document images.
In many document imaging systems, large numbers of forms are scanned into a computer, which then processes the resultant document images to extract pertinent information. Typically the forms comprise preprinted templates, containing predefined fields that have been filled in by hand or with machine-printed characters. To extract the information that has been filled in, the computer must first identify the fields of the template and then decipher the characters appearing in the fields. Various methods of image analysis and optical character recognition (OCR) are known in the art for these purposes.
In order to identify the fields of the template and assign the characters to the correct fields, a common technique is for the computer to register each document image with a reference image of the template. Once the template is registered, it can be dropped from the document image, leaving only the handwritten characters in their appropriate locations on the page. Such registration may be difficult, however, because of distortions introduced in scanning the original form, including skew, rotation, warp and other scale changes.
One method for dealing with these distortions is described in U.S. Pat. No. 5,182,656, whose disclosure is incorporated herein by reference. The original image is partitioned into a number of relatively small overlapping segments. Each of the segments is then shifted in order to bring it into alignment with an appropriate, corresponding segment of the reference template image. More complex transformations, such as rotations or scale changes, are not performed on these segments. The transformation of the entire image is thus represented as a combination of the shifts of the small segments, which can approximate rotations and scale changes if the segments are made small enough.
FIG. 1 is a schematic illustration showing a detail of an image of a filled-in form document 20, useful in understanding the method of U.S. Pat. No. 5,182,656 and of similar methods known in the art. Here a name, comprising characters 24, has been filled into a box 22 provided by a template on document 20. The box is slightly rotated relative to its proper, horizontal position on the reference template. In order to correct this rotation and for other distortions in the scanned image, the image of the document is divided into segments 26, 28, 30, 32, etc. The image is analyzed to determine the appropriate shift transformation to be applied to each of the segments, as specified in the patent.
FIG. 2 is a schematic illustration of segments 26. 28, 30 and 32 in their respective, transformed positions. To compensate for the rotation, a different shift is applied to each of the segments. The relative shifts are exaggerated in the figure for clarity of illustration. Furthermore, an overlap has been introduced between the transformed segments, such as may result from a scale distortion in the scanned image, for example. In these areas, the above-mentioned patent suggests performing an OR operation for each pixel in order to avoid having one segment overwrite another.
FIG. 3 schematically illustrates the results of the shifts shown in FIG. 2. Characters 24 are broken into respective upper portions 34 and lower portions 36, because the characters happened to cross the boundaries between segments 26 and 30 and between segments 28 and 32. In consequence, even though box 22 may in the end be successfully registered with the corresponding box in the reference template image, the characters in the box are difficult or impossible for the computer to decipher. This document will probably have to be passed to a human operator for data entry, adding substantially to the cost of processing the document. What is worse, the computer may misinterpret the distorted characters, leading to an error in the data extracted from the form.
U.S. Pat. No. 5,793,887, whose disclosure is incorporated herein by reference, describes another method for alignment of images for template elimination. In this case, a filled-in document image and a reference template image are divided into vertical bands. The bands are shifted relative to one another in order to correlate the lines in the document image with the lines in the corresponding bands of the reference template image. If necessary, the procedure is then repeated using horizontal bands. Thus, this method can also lead to break-up of characters, as illustrated in FIG. 3.
It is an object of the present invention to provide improved methods and apparatus for processing images, and particularly for processing images of filled-in form documents.
It is a further object of some aspects of the present invention to provide methods and apparatus for document image processing that improve the readability of characters in such images in the presence of image distortion and rotation. It is a particular object of these aspects of the present invention to alleviate problems of readability that may arise due to misalignment between segments of such images that are shifted for the purpose of template registration.
In preferred embodiments of the present invention, an input document image, containing characters filled into a form template, is processed in order to register the template in the image with a reference template image. Any suitable method known in the art may be used for this purpose. Regions of interest, defined as regions containing filled-in characters, are identified in the processed image. Each of these regions is preferably checked in order to determine whether the readability of the characters in the region has been adversely affected by transformations applied in processing the image to register it with the reference template. Typically, although not exclusively, such adverse effects result when different parts of the region fall into different segments of the image to which different transformations are applied. The contents of each of the affected regions (or of all of the regions) are then erased from the processed image and are replaced by the contents of the corresponding region of the input image. In this manner, all of the characters in the input image are placed in the correct locations relative to the reference template, substantially without adverse effect on the readability of the characters.
Although preferred embodiments are described herein with reference to document form images, it will be appreciated that the principles of the present invention may likewise be applied in other contexts, as well. It frequently occurs in image processing that a digital transformation applied to an image is not exact, in the sense that the resulting offset between two neighboring pixels is different from the theoretical offset. This inexactness may be due to local segment transformations, as described above, or to other errors, such as rounding or decimation. If there are areas of particular interest in the image, such as a text block or other significant image features, the method of the present invention may be used advantageously to enhance the readability and/or clarity of details in these areas.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a method for processing an input image, including:
applying one or more transformations to the input image, whereby different shifts are applied to different pixels in the input image, so as to generate an output image;
selecting in the output image a region containing content of interest;
locating in the input image the region corresponding to the selected region in the output image; and
substituting the content of the located region in the input image for the content of the selected region in the output image.
Preferably, applying the one or more transformations includes dividing the input image into segments, and determining a transformation to be applied to each segment. In a preferred embodiment, the input image includes a template delineating the region, which is filled in with the content of interest, and determining the transformation to be applied to each segment includes finding one or more translations of the segment that approximately compensate for a distortion of the input image relative to a reference template, whereby the output image is registered with the reference template. Typically, the one or more translations compensate for a rotation of the input image relative to the reference template. Further typically, applying the one or more transformations includes applying different transformations to two or more mutually-adjoining segments, and selecting the region includes selecting a region that was divided between the two or more segments to which different transformations were applied.
Preferably, the input image includes a template delineating the region, which is filled in with the content of interest, and selecting the region includes identifying a field of the template that is intended to receive the content of interest. Alternatively or additionally, selecting the region includes removing the template from the output image and selecting a portion of the image remaining after the template is removed.
Further preferably, selecting the region includes selecting a region responsive to the one or more transformations applied to the input image. In a preferred embodiment, the content of interest includes alphanumeric characters, and selecting the region includes selecting a region in which it is likely that the one or more transformations have adversely affected the readability of the characters in the region.
Preferably, locating the region includes finding the region of the input image that was transformed into the selected region of the output image by the one or more transformations.
Further preferably, substituting the content of the located region includes finding connected components in the located region and copying the connected components to the selected region in the output image. Most preferably, copying the connected components includes finding, for each of the connected components in the located region, a translation operation to be applied to all of the points in the connected component. Preferably, finding the translation operation includes, for each of the connected components, choosing a point on or in a vicinity of the connected component and determining a translation that was applied to that point by the one or more transformations applied to the input image. In a preferred embodiment, finding the connected components includes finding characters in the image.
In a further preferred embodiment, the content of interest includes alphanumeric characters, and the method includes applying optical character recognition to the substituted content in the selected region.
There is also provided, in accordance with a preferred embodiment of the present invention, apparatus for processing an input image of a document including a template having one or more regions that are filled in with content, the apparatus including a form processor, which is adapted to apply one or more transformations to the input image so as to generate an output image in which the template is registered with a reference template, to select at least one of the filled-in regions in the output image, to locate in the input image at least one filled-in region corresponding to the at least one selected region in the output image, and to substitute the content of the at least one located region in the input image for the content of the at least one selected region in the output image.
Preferably, the content filled into the one or more region includes alphanumeric characters, and the processor applies optical character recognition to the substituted content in the at least one selected region so as to extract the content from the document. Further preferably, the apparatus includes an imaging device, which is adapted to scan the document so as to generate the input image.
There is additionally provided, in accordance with a preferred embodiment of the present invention, a computer software product for processing an input image, the product including a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to apply one or more transformations to the input image, whereby different shifts are applied to different pixels in the input image, so as to generate an output image, to select in the output image a region containing content of interest, to locate in the input image the region corresponding to the selected region in the output image, and to substitute the content of the located region in the input image for the content of the selected region in the output image.
In a preferred embodiment, the input image includes a template delineating the region, which is filled in with the content of interest, and the instructions cause the computer to apply the one or more transformations so as to register the output image with a reference template.