The present invention relates to an image processing system for processing printed forms containing information recorded thereon by a person, and more particularly to an improved image processing system for scanning forms and processing data from predetermined fields of each forms.
In conventional form scanning, the form which contains marked information, e.g., areas that have been filled in or left blanked, or checked, etc., is fed into a scanning device which generates digital data representative of a video image of the form. The derived electronic information, in the form of digital bits of data, is transferred to a computer or other suitable processing means, where the data corresponding to the predetermined fields is processed to derive the information content the user has marked down in those fields. The form is presumed to be properly positioned in the scanner so that the locations of areas or fields, where marks are to be located, are known to the computer. Thus, the computer is informed in advance of the coordinates of predetermined fields of data on a form, and applies that information directly to the video image data to locate those predetermined fields where information is to be found.
Such prior art systems are limited in their accuracy due to the operating premise that the document is properly oriented in the scanner. This premise poses no problem where the applicable tolerances are fairly large, such that a fair degree of skew or misalignment of the scanned form can be accommodated. However, the greater the amount of data on the form, the greater the required precision of identifying the predetermined fields that contain that data, and the less reliable are the systems which make no allowance for imprecise positioning of the form with respect to the scanner.
Other problems exist regarding applications where extremely large numbers of forms must be scanned and processed, and where time and operating expense are very significant. For example, where a multi-page form contains data that is distributed throughout, some of which cannot be automatically processed, there is a great need to reduce the operator time required to find the data and manually enter it into the system. Frequently multi-page forms are designed from the viewpoint of being clear to the persons who will fill them in, and not from the viewpoint of optimizing retrieval of the data. Thus, considerable delay may be caused by the time it takes an operator to locate the desired fields of information, so as to be able to identify the data, encode it, and enter it properly into storage. To the extent that such data can be identified automatically and represented to the operator in a more organized format, operator time can be reduced significantly. Further, such organization of data is helpful in achieving the ultimate objective of complete automated processing of all data derived from the form, which would eliminate any need for operator entry of data prior to processing.
Another need that has arisen in data gathering applications involving scanners is to record images of selected written answers, e.g., printed on sheets or stored on optical discs, or both. Since storage is expensive, whatever the storage medium, there is a need to select and format certain images for display and storage, while avoiding the requirement of storing the entire form.
There is thus a significant need in the art for a system and a method for processing scanned form data so as to identify predetermined fields of data, generating image data corresponding to just such specified fields of data for processing, and providing improved formatting of such specified data for storage and/or processing.