This invention relates to an image based document processing system and apparatus for converting paper documents into electronic data and electronic images and managing the transactions initiated by those documents using both the images and data extracted from the images. The system manages document entry and flow within a business or other organization by allowing user interaction with the electronically captured document.
In the processing of transaction documents by a large business or governmental agency, there is generally a need to accomplish at least three basic objectives. The first objective relates to the capture of data so that it can be electronically stored, for example, by transmittal to a host computer system. This data may be pertinent to accounts payable, insurance policyholder records, mail order records, taxpayer records or other business information. Secondly, there is a need to index and record the images of the documents from which the stored data was extracted for future retrieval and usage. Third, there is a need to manage the transactions requiring human judgement initiated by the documents and supply the captured data and image for use in the processing of a transaction, such as adjudicating an insurance claim or underwriting a loan application in the usual course of business. Until the present invention, there has not been a satisfactory method or apparatus for automatically capturing, identifying, indexing, and recording data and images from an incoming stream of documents of intermixed sizes and formats for future interactive use.
Many companies employ manual sorting of documents, generally beginning with the receipt of the documents in a mailroom. The disadvantages inherent in such systems of document sorting are numerous. For example, sorting documents in the mailroom is labor intensive and costly. Manual sorting results in far greater error and document misidentification than electronic classification accomplished pursuant to the present invention.
Manual sorting of the contents of an envelope is presently accomplished in several ways. Documents may be sorted by size, so that all documents with the same physical dimensions and format, such as 1040 Tax Forms, are manually segregated and grouped. This grouping is necessary because prior automatic document processing devices cannot accommodate documents of varying format. This pre-selection is necessary, using prior systems, to enable the software system to identify the data fields as they are geographically located on the document page. Pre-selection is also generally required in prior systems to accommodate paper feeding devices which will not tolerate varying sizes and weights of input documents.
With the introduction of optical readers, some flexibility was introduced into the system by first labeling each document with a unique identification which identifies the format of the document and allows the system to accommodate different forms without being separated into individual pre-sized groups. However, this system requires that the document format be pre-serialized, and many forms and documents exist without a serialized identification. Thus, many documents are not readable by this pre-serialized type of system. Accordingly, the capability of processing many of the different sized and format documents did not exist before the present invention.
Thus, even with pre-serialized systems, there has been a need in the industry for a document processing system which accomplishes electronic identification, delivery, storage, and retrieval of documents of various sizes and types, without the need for a special ID code or other serially printed mark to ascertain the identity of the document under observation. Also needed is a system which may be adapted for existing tax forms and other documents without the necessity of changing these standardized tax forms or other documents to include pre-printed marks or numbers.
U.S. Pat. No. 4,205,780 (the "'780 patent") relates to a document processing system with a video camera and television monitor. The typical document transport as described in the '780 patent has the capability of reading magnetic ink character recognition (MICR) data or OCR data encoded on the documents being processed, recording the data, and sorting the documents in a predetermined manner, but requires that the documents be sorted by bank employees before they are loaded into the document scanner and that all the document formats conform. "Header" and "trailer" cards function to separate each batch of documents. Header cards contain MICR data that identify the account being processed.
The present invention differs from the '780 patent disclosure. For instance, the '780 patent describes the use of MICR and OCR machine readable characters in processing check transactions. Stylized characters and special fonts are used, pursuant to the '780 patent, to allow machine recognition of forms and remittance documents which are pre-printed and manufactured, thus avoiding the automatic identification of other forms which are not specially pre-printed. However, the present invention uses the ability to automatically identify documents that are not pre-printed to keep envelope contents separate, processing the contents of the envelopes as a transaction.
The '780 patent requires that human operators routinely enter data by hand keying the data on a keyboard. The present invention, however, allows for data capture without operator keying in many applications. This results because the present system will locate and extract data from existing forms after identifying the forms. In cases where data is machine readable, no operator keying is necessary for most data with the present invention.