The present invention relates generally to document image processing, and specifically to methods for extraction of information that is filled into preprinted forms.
In many document imaging systems, large numbers of forms are scanned into a computer, which then processes the resultant document images to extract pertinent information. Typically the forms comprise preprinted templates, containing predefined fields that have been filled in by hand or with machine-printed characters. In some applications, such as population censuses and tax processing systems, a wide variety of different forms are used, and may typically be input to the computer in any order. Before extracting the information that has been filled into any given form, the computer must first recognize which form it has received. Only then can the computer align the form with the proper template, in order to identify the fields of the template and decipher the characters in the fields.
There are a number of methods of form recognition known in the art. Fiducial marks and machine-readable identifying information, such as a bar code, may be printed on the form itself. This approach uses up valuable form xe2x80x9creal estate,xe2x80x9d however, and makes for an inflexible system, capable of processing only forms that have been designed in advance with the appropriate marks. Therefore, a preferred approach is to identify features in the scanned form image. For example, horizontal and vertical lines and text base lines (imaginary lines running through the bottom of the characters in a row), as well as the black background of the scanned paper, may be identified and located. Still, none of these features is guaranteed to exist in all forms of interest, and the text baseline features are generally expensive to compute. Because of the large number of different forms that must be processed in applications such as those mentioned above, there is a need for a form recognition method that is both accurate and fast.
U.S. Pat. No. 5,191,525, whose disclosure is incorporated herein by reference, describes a system and method for extraction of data from form documents for subsequent processing. In order to identify forms passing through the system, a number of identification areas are chosen in advance by a designer from existing printed forms, using an interactive computer display. The designer pre-selects a word or an area already present on the form. The coordinates of the identification area are stored in the computer and are accessed in order to xe2x80x9ccarvexe2x80x9d the appropriate areas from the documents as they are processed. The spelling of the selected word or an electronic signature of the pixel pattern in the area are used to match the features of the processed document to the form. The electronic signature may include a histogram or intersection count. Once the form is identified, the relative location of the selected word or the electronic signature is used to adjust for misregistration and skew that may have arisen in printing or scanning the form.
Another method known in the art for matching pairs of images, such as the form and template images, uses multi-resolution pyramids. This method is most commonly used in motion estimation. Reduced, low-resolution versions of the images are first matched, and the displacement of one image relative to the other is estimated. The matching and displacement estimation are repeated and refined at a succession of higher resolutions (hence the name xe2x80x9cpyramidxe2x80x9d), until the exact, full-resolution result is found. This type of matching is problematic in relation to form images, however, since it is liable to be confused by information filled into the form.
It is an object of the present invention to provide improved methods and apparatus for processing of images of form documents.
It is a further object of some aspects of the present invention to provide improved methods for automatically identifying which of a plurality of different form templates corresponds to a given form document.
It is yet a further object of some aspects of the present invention to provide methods for automatically selecting reference areas in form templates and for automatically matching areas of a form document to the reference areas.
In preferred embodiments of the present invention, a document processing system receives images of filled-in form documents to process. The documents are based on a plurality of different form templates whose order of input to the system is not known in advance. In order to determine on which template each document is based, reference areas are chosen on each template. Preferably, one reference area is chosen in each of a number of predefined sectors of the template, typically one reference area per quadrant. Most preferably, the reference areas are chosen automatically, based on a reference metric that embodies criteria designed to ensure that the features of the reference areas are clearly distinguishable from their immediate surroundings and that they are not in parts of the form that are going to be filled in. Thus, there is no need for fiducial marks on the forms or for other constraints on form design. There is also generally no need for a human operator to select the reference areas, as required by the method of the above-mentioned U.S. Pat. No. 5,191,525.
For each template that is a candidate to match the current document, reference areas on the document image are located and registered with the corresponding reference areas of the template. The reference areas on the template and the document are then compared to find a matching score. Generally, the template having the best overall matching score is recognized as the one belonging to this document. The correct template is thus chosen rapidly and, in most cases, completely automatically, even in the absence of lines on the form or other specific features that are commonly used for template recognition and registration. The entire document image is adjusted for fine registration with the chosen template, and the information filled into the document is extracted from the fields of the form, using any suitable method known in the art.
In some preferred embodiments of the present invention, prior to matching the reference areas, a reduced-scale image of the form document is compared to reduced-scale images of the templates. These reduced-scale images are referred to herein as icons. Matching of the icons is used to make a preliminary assessment of which templates are the best candidates to match the whole document. To generate such an icon, the full-scale image is binarized (if it is not already in binary form) and is divided into a matrix of blocks. A gray-scale icon is produced having one pixel for each block in the full-scale image. The gray-scale value of each pixel in the icon is equal to the sum of the binary values of the pixels in the corresponding block. Preferably, the gray-scale icon is then binarized, using any suitable binarization algorithm known in the art.
It has been found that icons produced in this manner represent the full-scale source images more faithfully than do icons generated by purely binary scaling algorithms, as are known in the art. Furthermore, the use of icon matching as a preliminary step to matching the reference areas saves processing time, by narrowing the field of candidate templates. On the other hand, combining one icon matching stage with the subsequent reference area matching stage is considerably faster and less computation-intensive than full, multi-resolution pyramid-based fitting algorithms, as are known in the art.
In some preferred embodiments of the present invention, the reference areas are used to register the form document with the candidate template or templates in a number of successive stages. In a first, coarse stage, a tentative identification is made of the reference areas in the document image that most closely correspond to the defined reference areas in the candidate template. Preferably, the tentative identification is made by histogram matching, or by another computationally-efficient technique known in the art. The identified reference areas on the form document are then registered with the corresponding areas of the template, typically by rotation, scaling and/or warping of the areas. Preferably, the image warping is carried out using methods such as those described in U.S. Pat. No. 5,793,887, which is incorporated herein by reference, although other methods known in the art may also be used.
At this point, the matching scores are computed and, typically, the template with the best matching score is chosen. A final transformation to be applied to the entire document image is preferably determined by finding a cluster of the transformations necessary to carry each of the document reference areas into the corresponding reference area of the template. These transformations typically include rotation, scaling and offset. The term xe2x80x9cclusterxe2x80x9d as used in the present patent applications and in the claims refers to a group of transformations of a given type whose magnitudes are within predefined bounds of one another. The final transformation that is applied is a composite of the cluster, preferably a weighted mean of the transformations in the cluster. Using clusters in this manner avoids the possibility of distortions due to the effect of outlying points, i.e., reference areas that have been incorrectly or inaccurately identified. It is also possible to use clusters of one or two of the transformation types (rotation, scaling offset) while taking a non-clustered mean or median of the other transformations.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a method for processing an input image, including:
for each of a plurality of different templates, computing a reference metric for each of a multiplicity of candidate areas;
selecting reference areas from among the candidate areas on each of the plurality of different templates responsive to the reference metrics thereof;
comparing reference areas on the input image to the selected reference areas on at least some of the templates, so as to compute a matching score for each of the templates indicating a degree of similarity between the template and the input image;
identifying the template whose matching score indicates the greatest degree of similarity; and
extracting information from the input image based on the identified template.
Preferably, the input image includes an image of a form document, having fields defined by one of the templates, which fields are filled in with the information, and extracting the information includes registering the input image with the identified template. Preferably, computing the reference metric for each of the candidate areas includes estimating a likelihood that the candidate area will be filled in. Most preferably, selecting the areas includes rejecting areas that include runs of white pixels of a length exceeding a predetermined criterion.
Additionally or alternatively, computing the reference metric for each of the candidate areas includes estimating a likelihood of a change in appearance of the area due to contrast variations in scanning an object that appears in the input image. Preferably, selecting the areas includes rejecting areas that include predetermined concentrations of black pixels.
Preferably, computing the reference metric includes determining a risk score indicative of a likelihood of confusion in identifying the template based on a given reference area, and selecting the reference areas includes selecting one or more reference areas having respective risk scores within a predetermined limit. Most preferably, selecting the reference areas includes selecting at least one of the candidate areas in each of a number of predetermined sectors of the template.
There is also provided, in accordance with a preferred embodiment of the present invention, a method for processing an input image with reference to a plurality of image templates, the method including:
generating an input icon representing the input image at a reduced scale;
comparing the input icon to a plurality of template icons, each such template icon representing a respective one of the templates at the reduced scale, so as to compute an icon matching score for each of the templates indicating a degree of similarity between the template icon and the input image icon;
selecting one or more of the templates whose respective icon matching scores indicate a degree of similarity greater than that of the other templates;
comparing reference areas on the input image to chosen reference areas on the one or more selected templates, so as to compute a template matching score for each of the templates indicating a degree of similarity between the template and the input image;
identifying the template whose template matching score indicates the greatest degree of similarity; and
extracting information from the input image based on the identified template.
Preferably, comparing the icons includes finding respective initial transformations based on aligning the input icon with the icons of the selected templates, and comparing the reference areas includes applying the respective initial transformations to locate the reference areas in the input image. Further preferably, comparing the reference areas includes applying one or more additional transformations to align the input image with the identified template, responsive to a discrepancy found between the positions of the reference areas on the input image and on the identified template. Most preferably, applying the one or more additional transformations includes determining respective transformations to be applied to the reference areas on the input image in order to register the reference areas on the input image with corresponding reference areas on the identified template, and applying a weighted composite of the determined transformations to the entire input image, so as to bring the input image into alignment with the template.
Preferably, comparing the reference areas includes computing an area matching score indicative of a degree of similarity between each of the reference areas on the input image and a corresponding one of the chosen reference areas on each of the selected templates, and computing the template matching score based on the area matching scores.
There is additionally provided, in accordance with a preferred embodiment of the present invention, a method for reducing the scale of a binary image, in which each binary pixel has a value of zero or one, including:
dividing the binary image into a matrix of blocks of a predetermined size; and
generating a gray-scale image including a plurality of gray-scale pixels, each corresponding to one of the blocks in the matrix and having a gray-scale pixel value equal to the summed value of the binary pixels in the block.
Preferably, the method includes binarizing the gray-scale image to generate a reduced-scale binary image.
There is further provided, in accordance with a preferred embodiment of the present invention, a method for processing an input image, including:
determining respective reference area transformations to be applied to a plurality of reference areas on the input image in order to register the reference areas on the input image with corresponding reference areas on a template;
deriving from selected subsets of the reference area transformations candidate global transformations of one or more types to be applied to the input image;
finding a cluster of the candidate global transformations of a given one of the types having respective magnitudes within predetermined limits of one another;
applying a composite result of the candidate global transformations in the cluster to the entire input image, so as to bring the input image into registration with the template; and
extracting information from the input image based on the template with which the image is registered.
Preferably, the types of the transformations are selected from a group of transformations consisting of skew, scale and offset transformations. Further preferably, applying the composite result includes computing a weighted mean of the magnitudes of the transformations in the cluster, and applying the given type of transformation to the entire input image with a transformation magnitude given by the weighted mean.
There is moreover provided, in accordance with a preferred embodiment of the present invention, apparatus for processing an input image, including:
a memory, which is adapted to store images of a plurality of different templates; and
an image processor, which is adapted to compute a reference metric for each of a multiplicity of candidate areas on each of the templates and to select reference areas from among the candidate areas on each of the plurality of different templates responsive to the reference metrics thereof, and further to receive the input image and compare reference areas on the input image to the selected reference areas on at least some of the templates, so as to compute a matching score for each of the templates indicating a degree of similarity between the template and the input image and to identify the template whose matching score indicates the greatest degree of similarity, and extract information from the input image based on the identified template.
Preferably, the input image includes an image of a form document, having fields defined by one of the templates, which fields are filled in with the information, and the processor is adapted to register the input image with the identified template. In a preferred embodiment, the apparatus includes an image input device, which is adapted to capture the image of the form document and to convey the image to the processor.
There is furthermore provided, in accordance with a preferred embodiment of the present invention, apparatus for processing an input image, including:
a memory, which is adapted to store image data with respect to a plurality of different templates, including template icons, each such template icon representing a respective one of the templates at the reduced scale, and reference areas chosen on each of the templates; and
an image processor, adapted to receive the input image and to generate an input icon representing the input image at a reduced scale, to compare the input icon to the template icons so as to compute an icon matching score for each of the templates indicating a degree of similarity between the template icon and the input image icon and to select one or more of the templates whose respective icon matching scores indicate a degree of similarity greater than that of the other templates, and further to compare reference areas on the input image to the chosen reference areas on the one or more selected templates, so as to compute a template matching score for each of the templates indicating a degree of similarity between the template and the input image, and to identify the template whose template matching score indicates the greatest degree of similarity, so as to extract information from the input image based on the identified template.
There is additionally provided, in accordance with a preferred embodiment of the present invention, apparatus for reducing the scale of a binary image, in which each binary pixel has a value of zero or one, including an image processor, which is adapted to divide the binary image into a matrix of blocks of a predetermined size, and to generate a gray-scale image including a plurality of gray-scale pixels, each corresponding to one of the blocks in the matrix and having a gray-scale pixel value equal to the summed value of the binary pixels in the block.
There is also provided, in accordance with a preferred embodiment of the present invention, apparatus for processing an input image, including:
a memory, which is adapted to store image data with respect to a plurality of different templates, including reference areas chosen on the templates; and
an image processor, adapted to receive the input image and to determine respective reference area transformations to be applied to a plurality of reference areas on the input image in order to register the reference areas on the input image with corresponding reference areas on a template, and to derive from selected subsets of the reference area transformations candidate global transformations of one or more types to be applied to the input image, to find a cluster of the candidate global transformations of a given one of the types having respective magnitudes within predetermined limits of one another, and to apply a composite result of the candidate global transformations in the cluster to the entire input image, so as to bring the input image into registration with the template, thus to extract information from the input image based on the template with which the image is registered.
There is further provided, in accordance with a preferred embodiment of the present invention, a computer software product for processing an input image, including a computer-readable medium having program instructions stored therein, which instructions, when read by a computer, cause the computer to compute a reference metric for each of a multiplicity of candidate areas reference areas on each of a plurality of different templates and to select reference areas from among the candidate areas on each of the plurality of different templates responsive to the reference metrics thereof, and further to compare reference areas on the input image to the selected reference areas on at least some of the templates, so as to compute a matching score for each of the templates indicating a degree of similarity between the template and the input image and to identify the template whose matching score indicates the greatest degree of similarity, and to extract information from the input image based on the identified template.
There is moreover provided, in accordance with a preferred embodiment of the present invention, a computer software product for processing an input image, including a computer-readable medium having program instructions stored therein, which instructions, when read by a computer, cause the computer to generate an input icon representing the input image at a reduced scale, to compare the input icon to a plurality of template icons, each such template icon representing a respective one of the templates at the reduced scale, so as to compute an icon matching score for each of the templates indicating a degree of similarity between the template icon and the input image icon and to select one or more of the templates whose respective icon matching scores indicate a degree of similarity greater than that of the other templates, to further to compare reference areas on the input image to chosen reference areas on the one or more selected templates, so as to compute a template matching score for each of the templates indicating a degree of similarity between the template and the input image, and to identify the template whose template matching score indicates the greatest degree of similarity, and to extract information from the input image based on the identified template.
There is still further provided, in accordance with a preferred embodiment of the present invention, a computer software product for processing an input image, including a computer-readable medium having program instructions stored therein, which instructions, when read by a computer, cause the computer to divide the binary image into a matrix of blocks of a predetermined size, and to generate a gray-scale image including a plurality of gray-scale pixels, each corresponding to one of the blocks in the matrix and having a gray-scale pixel value equal to the summed value of the binary pixels in the block, whereby a reduced-scale binary image is generated by binarizing the gray-scale image.
There is also provided, in accordance with a preferred embodiment of the present invention, a computer software product for processing an input image, including a computer-readable medium having program instructions stored therein, which instructions, when read by a computer, cause the computer to determine respective reference area transformations to be applied to a plurality of reference areas on the input image in order to register the reference areas on the input image with corresponding reference areas on a template, and to derive from selected subsets of the reference area transformations candidate global transformations of one or more types to be applied to the input image, to find a cluster of the candidate global transformations of a given one of the types having respective magnitudes within predetermined limits of one another, and to apply a composite result of the candidate global transformations in the cluster to the entire input image, so as to bring the input image into registration with the template, and to extract information from the input image based on the template with which the image is registered.
The present invention will be more fully understood from the following detailed description of the preferred embodiments thereof, taken together with the drawings in which: