Methods for extraction of information filled into form documents are well known in the art. Typically, a document is printed with a form template. The template contains predefined fields that are filled in by a user with appropriate characters. The document is scanned into a computer, which typically uses an optical character recognition (OCR) program to identify and code the characters in each field.
OCR identification of handwritten, or even typed, characters can be uncertain, due to a range of problems including uneven scan quality, variable character shapes, and interference between the filled-in characters and features of the printed template. A variety of methods and systems have been developed to deal with these problems. For example, U.S. Pat. Nos. 5,182,656, 5,191,525 and 5,793,887, whose disclosures are incorporated herein by reference, describe methods for registering a document image with a form template so as to remove the template and extract the filled-in information from the form. Once the form is accurately registered with the known template, it is a simple matter for the computer to assign the fill-in characters to the appropriate fields. Dropping the template from the document image also reduces substantially the volume of memory required to transmit or store the image.
Because of the uncertainty of machine identification of characters by OCR, methods have been developed for selectively verifying the correctness of coded results. For example, U.S. Pat. No. 5,455,875, whose disclosure is incorporated herein by reference, describes a system and method for correction of optical character recognition, based on an interactive display of OCR results that is designed to enable an operator to correct erroneous character data reliably and efficiently.
Even in data that are not generated by OCR, there are commonly errors and inconsistencies, such as address information that is out of date or misspelled. To deal with problems of this sort, a number of companies offer address verification services, in which a mailing list is checked against an up-to-date master list. One example of such a service is “InfoBase BestAddress,” offered by Acxiom Corporation, as described at www.acxiom.com. This service both identifies incorrect addresses and, where possible, provides corrections. The U.S. Postal Service offers master address databases that can be used to do this sort of verification.