This invention relates to a forms recognition system, and more particularly to an automated method for assisting creation of templates from blank masters in a forms recognition system.
Often in forms recognition systems, a template is made from the bitmap of a master form, the master being a blank copy of a form. This template has designated areas or fields in which data will be added by a user, and eventually extracted, for storage into a database by the recognition system. Once this template is defined, the recognition system uses this template to process an incoming completed form having the same layout as the template.
Typically, the creation of the template is a manual task which requires time and fine motor skills on the part of the creator. A master form is scanned in and the bitmap, or a reduced resolution version of the bitmap, is displayed on a graphics display of a graphical workstation. The creator of the template then designates on the displayed bitmap the positioning of areas which will contain data to be extracted. These areas are often referred to as fields. These fields may be outlined or framed, for example by a rectangular box, and may also contain text asking a question. It can be appreciated that many different ways of indicating fields on a form can be envisioned.
There arises difficulties when using a computer graphics interface to designate certain graphical objects, like a box around a field, when the object is to be selected from a bitmap. Usually, it is necessary to manually define the object or region using a pointing device such as a mouse. The operator may use the pointing device to select a corner of the object or region and draw along the boundary of the object, being careful not to include extraneous parts of the overall object, and yet including enough of the bitmap to specify the entire field of the form. Fine motor coordination on the part of the operator is necessary for successful definition of the object. Furthermore, for each template created, the operator would have to perform such a function many times. It is not uncommon for a form to have 100 separate fields which would have to be designated.
The operation of designating fields may be further complicated when the image processing used in the recognition system requires additional regional definitions. For example, the system may need both the area inside a rectangle, designating a field, as well as a buffer around that rectangle to account for text which flows out of the boundaries of that rectangle. Furthermore, when working with a high resolution bitmap, it may be impossible for a human using a pointing device to designate the exact coordinate position of an area depicting a field on a graphical display. Therefore, an automated method which assists the operator in creating a template would be highly useful.
One approach for using a computer system for extracting data from a form is set forth in an article by R. G. Casey and D. R. Ferguson in the IBM Systems Journal Vol. 29, No. 3, 1990 Pages 435-450 titled "Intelligent Forms Processing." Disclosed is a forms recognition process whereby the forms are scanned creating an electronic bitmap representation of the image. Using various image processing techniques, the bitmap form is analyzed and compared with other bitmap forms, stored in a database as templates, until a match is found. If no match is found, the user must enter the new type of form by interacting with the computer and specifying a new addition to the template database. The template creation process includes displaying the bitmap on a graphical display and using a pointing device to designate the outlines of fields. No automated method to assist the operator in creating the templates is disclosed.
Similar to the article by Casey and Ferguson, U.S. Pat. No. 4,933,979 to Suzuki et al. discloses a form sheet reading apparatus. In this system, a form is scanned resulting in an electronic bitmap which is displayed for a user on a graphic display terminal. The user, using a graphical input interface, then selects on the displayed bitmap the items which make up the template. For instance, the user may outline a box which has information answering a question posed on the form. The designations of the user are stored into a first file and is later used as a template. However, the process of template creation disclosed in Suzuki et al. is very manual intensive and prone to error. Also, Suzuki et al. requires that there be at least one line on the form in order to detect a field thus, eliminating the possibility of recognizing forms which do not have lines designating the fields. Again, no automated method of assisting the operator in creating the template is disclosed.
U.S. Pat. No. 5,014,331 to Kurogane et al. discloses a method of determining an internal point within a closed area on a bitmapped image. The operator draws a free form continuous circumference of a closed area using a pointing device. From this drawing, a rectangular region is determined and thus, a point within the determined rectangular region is calculated. Even though Kurogane et al. set forth a method to find a rectangular region, it does not address the problem of finding the exact boundary of a rectangular region already on the bitmap.
As can be appreciated, designating the exact position of an object or field on a bitmap is difficult, time consuming, and maybe even impossible using the methods disclosed above. Therefore, an easier, faster and more precise method of creating a template from a bitmap is warranted. This process would contain automation which would assist the operator in creating new templates from a bitmap reducing the problems of template creation discussed above.