Standardized forms are used in many industries (insurance, health care, government, etc.). Document receiving/processing organizations often receive vast quantities of completed, e.g. filled in, printed forms from users, such as health care insurance forms, investment account forms, magazine subscription forms, change of address forms, and generally forms utilized to collect, provide and/or submit customer or client information. A received form may be thought of to include two distinct types of information: (1) predefined/underlying typographic and directional information of the form itself, e.g. instructions for filling in the form, field designators, such as boxes or lines, and field names such as “name” and “address”, which specify the purpose of the form and/or direct what type of, and where, information is to be provided, and (2) any hand, typewritten, or printed information subsequently provided or added to the form, such as by a user or someone other than the form's creator/designer. The document receiving organization typically scans the received form and generates an electronic image, if not received as an electronic image, containing both types of information. From these electronic images, the user added information is typically extracted, such as via optical character recognition, for subsequent processing, such as storage in a database.
The extraction process, which may distinguish between the form and the user information added thereto, may be complicated by artifacts on the form such as stray marks, tears, coffee stains, etc. Furthermore, while forms may be filled out by hand with a pen or pencil, it is common to feed pre-printed forms through a typewriter or computer printer so as to fill in the requisite information. In some instances, the form, along with the user added information, is entirely computer generated and printed to a blank sheet of paper in a single operation. Any time one of these forms is fed to a typewriter, printer or scanner in order to electronically fill in the requisite information, or where the form and information are printed together on to a blank sheet, there is a chance the form will not be aligned correctly with the device or the entered data which results in the entered data appearing unaligned with the locations designated to receive that data, referred to as “print shift,” which may impact the ability of the document processing organization to electronically process the document image of that form to extract the user added information. For example, the shifted data may overlay elements of the form making it difficult to electronically distinguish the user added data from the form elements, the shifted data may be ambiguously positioned relative to multiple form elements making it difficult to electronically identify the nature of the information, e.g. to which field it belongs, or the user data may be out of position with respect to the locations on the form where the electronic extraction process expects to find the user added data, impeding or otherwise rendering the extraction process inoperable. All of these situations may result in reduced accuracy, reduced efficiency and/or the need for operator intervention in an otherwise automated process which may decrease efficiency and increase cost.