The present invention relates generally to document processing, and more specifically to documents having printed thereon indicia of the position, size, type, and the like, of fields which may contain data to be extracted from the document, and systems for producing and utilizing same.
Machine readable forms have been in common use for some time. Such forms provide a mechanism for enabling actions to be taken based on marks on a paper without requiring human intervention such as reading or interpreting the forms. The marks on such forms are extracted under the control of a device commonly referred to as a form interpreter. The forms are "read," most often optically by a scanner or the like, and the form interpreter then locates and characterizes the marks on the forms, and may output control signals as a function of the presence, location, nature, etc., of the marks to peripheral devices.
A "form" of the type discussed above is defined for the purposes of the present invention as either a tangible printed document or the like or a data structure representing such a tangible printed document. The form may contain regions of arbitrary text, arbitrary graphics, and fields. "Fields," as used herein, is taken to mean regions of the form, either physical regions of the printed document or structured regions of the data structure representing the printed document, which are to be modified by a user. As used herein, a "user" may be either human or machine. Further, as used herein "modify" shall be taken to mean enter, add, delete, change, alter, connect, disconnect, highlight, fill-in, erase, strike-out or the like, when referring to a field. Examples of fields include "check box" fields (also called "bubbles"), alpha or alpha-numeric fields, image fields, etc. A form will often also include a reference point indication from which the location of the fields may be measured.
Information carried by forms can conveniently be divided into three categories: data, machine instructions, and other information. As used above, data is taken to mean information carried by the form to be read or extracted from the form for processing. Examples of data include blank or filled-in bubbles on a standardized examination answer form, payee fields on checks which are parsed for processing, etc. Machine instruction as used above refers to information carried by a form which is interpreted by the forms interpreter and which causes action either by the forms interpreter or by a remote device. Examples of machine instructions include information located on a form which, when read, cause data to be copied to or from memory locations of a computer, cause a mathematical or logical procedure to be applied to particular data, etc. Other information, as used above, refers generally to information ignored by the form interpreter, such as the arbitrary text and graphics mentioned above, prompts or instructions on the form to aid the user in filling in fields, information for the user's interest, ornamental treatment, etc. The first two categories of information, data and machine instructions, are of interest herein.
Forms carrying data are the most common type of form, and examples may be readily found in the art. For example, U.S. Pat. No. 4,634,148, to Greene, issued Jan. 6, 1987, teaches a form which is a draft check carrying data in the form of payee, amount, and maker. The fields carrying the data are located and the data is extracted for processing according to a preprogrammed scheme. Forms carrying instructions interpreted and used by machines are also known, for example from Rourke et al., U.S. Pat. No. 4,757,348, issued Jul. 12, 1988. Rourke et al. discloses an electronic reprographics/printing system which uses printed control forms, called separators, to segregate groups of documents from one another and to input control or programming instructions for processing the documents associated with each control form. In fact, forms carrying data as well as machine instructions are known, for example as taught by Tanaka in U.S. Pa. No. 4,494,862, issued Jan. 22, 1985. Tanaka describes a system wherein a form is given a bar code which, when interpreted by the forms interpreter section of the system, causes the system to read and print only those rows on the form marked with a special pen (see, e.g., col. 8, lines 32 et seq.)
Another reference of interest is the patent to Daniele, U.S. Pat. No. 4,728,984, issued Mar. 1, 1988. This reference relates to a system including an electronic printer for recording digital data on plain paper, together with the use of an input scanner for scanning digital data that has been recorded on such a recording medium to upload data into an appropriate device such as a computer or the printer itself. The applications of the system of this reference, however, are limited to decoding secret documents and inputting program information into a computer.
Forms of the data carrying type may in fact carry several different types of user applied data. For example, the above mentioned bubbles on a standardized examination answer form and the payee fields on checks which are parsed for processing represent two different types of data. In general, the data types are: digitally coded data, for example the filled-in or not filled-in state of a bubble; data for character recognition, such as bar codes, alpha-numerical data for optical character recognition (OCR) and the like; and data for image-wise handling, such as the payee field mentioned above, graphics and the like.
It is important for a practical form-using system to be able to distinguish between the various data types. One method for doing so is disclosed by Greene in U.S. Pat. No. 4,588,211, issued May 13, 1986. This reference discloses a machine readable document having fields identified by a coating of fluorescent ink. Data is written into the fields by the user on top of the fluorescent ink such that when the fields are illuminated by a proper light source, the written data will be black in sharp contrast to the fields. The fields include a binary coding which is applied by selectively blanking out regions of the fluorescent ink at the border of the fields, or regions of fluorescent ink remote from but logically associated with the field, for example as shown in FIG. 5, and discussed at col. 7, line 14, through col. 8, line 6. Greene distinguishes between the various data types by using the coding to cause different fields to be copied to different locations for printing.
Although machine processing of forms results in high speed and accuracy of processing, the systems disclosed in the previously mentioned references have several important limitations. These limitations have, inter alia, forced the use of machine read forms to be practical only when large numbers of identical forms are used, limited the organization and aesthetics of forms, and complicated the form creation process.
First, the form interpreter must be preprogrammed with a description of the form in order to locate the form fields. In most cases, a description of the physical location of the fields or parts of the fields relative to a reference point must be preprogrammed. This preprogramming requires substantial time, effort, and training, and most often is performed by an operator different from the person making up the form itself. Generically, there is a presently unfilled need in the art for a form, electronic or paper, which may be interpreted by a general form parser that has no previous knowledge of the form.
A second limitation is that any encoded instructions relative to specific data must either be physically part of the data field or otherwise physically or logically associated with the data field. It is desirable for form organization and aesthetics to be able to locate instructions (as well as other relevant information about form content and structure) at any arbitrary position a form designer chooses.
A third limitation is that a form designer has had to separately create fields then add supplemental information, such as coded field type, if coded information about the fields is to be carried by the form. It is desirable to allow simultaneous creation of a form and creation of a coded description of the form. That is, a form creation system is needed that allows a form designer to create a form such that the system keeps track of the position, type, etc. of fields of the form, and automatically includes the coded description in the form's data structure and/or automatically prints the coded information on a printed copy of the form together with the alphanumerics, graphics and other elements of the form.
A fourth limitation is that present form interpreters require manual preprogramming prior to form interpretation. However, a form interpreter system is desired which works with coded forms to read from the form instructions on extraction and handling of the data and other information the form carries.
Related to this limitation is another limitation--it has heretofore not been possible to remotely program a form interpreter. That is, presently, to program a form interpreter the programmer must have direct physical access to the workstation, personal computer, or the like, which controls the interpreter. A method and apparatus is desired for programming the interpreter remotely, say via a paper form, transmitted to the interpreter by facsimile, via communication directly from the work station or personal computer the form resides on, etc.
Yet another limitation is that the above described art is not capable of designating a field as more than one data type. That is, it has heretofore not been possible to designate the contents of a field for a variety of different processing. The ability to so designate the contents of a field for multiple processing avoids the need for rescanning, saving both processing time and computer memory space.
The realization and overcoming of the above limitations, and others not mentioned herein, form aspects of the present invention, which is summarized and described by way of illustrative embodiments below.