The present invention relates generally to document image processing systems, and more particularly to document image processing systems which process documents whose format may vary.
Today's financial services industry is facing the challenge of processing immense numbers of documents efficiently. Predictions that document payment methods would decline have not been realized. In fact, document payment methods have grown worldwide and are expected to continue to increase. Thus, there is a vital need to devise improved means and methods for processing such documents. The use of imaging technology as an aid to document processing has been recognized as one way of significantly improving document processing, as disclosed for example in U.S. Pat. Nos. 4,264,808 and 4,813,077, and European Patent EP 0 344 742 A2.
Generally, imaging involves optically scanning documents to produce digitized images that are processed electronically and stored on high capacity storage media (such as magnetic disk drives and/or optical memory) for later retrieval and display. It is apparent that document imaging provides the opportunity to reduce document handling and movement, since these electronic images can be used in place of the actual documents.
One feature of imaging systems is the capability to automatically read data from the image. Where the data can be machine read, manual entry is unnecesary and overall document processing throughput may be thereby increased. In applications where the documents processed are of uniform size and shape and the data is consistently located in a predetermined position on the document, the automatic reading of the data is simplified; minimal or no searching is necesary in the automated read operation. In contrast, applications where the documents processed vary in size, shape, and data location, the automatic reading of data is complicated by the fact that the desired data must be located before it can be read.
These variations in document format can substantially impact the cost effectivness of automated document image processing systems. If the effect of the document format variations is that the data cannot be found, then the data cannot be automatically read; if the data cannot be automatically read, it must be manually entered by an operator; if an operator is required to manually enter the data, the desired costs savings soon evaporate.
The problem of document format variations manifests itself when the number of documents for which the data was successfully read is small relative to the total number of documents processed. This figure is typically referred to as the success-rate. If the success-rate falls below a certain level, the automatic document processing system will cease to be cost effective (the level will vary from application to application).
Bank check processing systems are exemplary of applications where the document formats are prone to significant variations. The size, shape and data location may vary from check to check as well as from bank to bank. In addition to the size, shape and data location variations, the following two examples illustrate other document format variations encountered by banks.
First, is the foodstamp. On a foodstamp, there are two amounts on the document. One is the limit on the amount for which the stamp can be redeemed, and the second is the amount for which the stamp was actually redeemed. Often times these two amounts are not the same. Depending upon where the amounts are located, the automatic reader may read the maximum amount instead of the actual amount. Documents of this nature must be identified so that either the data can be manually entered or the proper data location can be provided to the automatic amount reader.
The second example where an automatic reader may encounter difficulties in reading the data from an exemplary check is a check which has a "$" that does not line up horizontally with the printed amount. This may occur when the checks are not adequately aligned in the printer which prints the checks. Because the "$" is used in locating the amount on a check, a misalignment between the "$" and the amount may cause the automatic reader to reject the document, thereby forcing a manual entry of the amount. If those documents whose amounts are misaligned can be identified, the search parameters for the automatic reader can be changed to search for locater character other than a "$", such as an "*", in locating the amount.
One way in which the foregoing difficulties are addressed by check processing systems is by providing the coordinates on the MICR code line at which the desired data on the document was located, as illustrated by U.S. Pat. No. 4,685,141. This approach is useful, but the problem remains that before the coordinates can be provided on the MICR code line the document must be surveyed to determine the correct coordinates. If each check supplier could be convinced to magnetically encode the data location coordinates on the bottom of the check this may be feasible. However, while some check suppliers may cooperate in such an effort, others may not. Furthermore, if a document format was changed, the document image processing system would have to rely on the check supplier to make the corresponding change to the MICR code line. Thus, for a document image processing system to rely on the cooperation of check printers for its success-rate would be risky. It would be more desirable for each particular document image processing system to have available current document format information based upon its recent processing activities, thus providing the capability to quickly adapt the system to a change in document format.
One way to identify current document format information for a document image processing system is to survey the documents before processing them. While surveying documents may be practical in some applications, in check processing applications this is not the case. In check processing systems, millions of documents need to be processed each day. The task of identifying document format information by sampling the documents processed would be overwhelming. Furthermore, the physical documents may not be available for examination for more than a short period of time. Therefore, surveying the checks after running them through the check processor would also be impractical. Alternatively, providing additional storage capacity for storing document images for later examination will not be cost effective where there are millions of documents processed.
The problems posed to document image processing systems by document format variations continue even after most document format features have been identified. As discussed earlier, document formats occasionally change. If the document processor is programmed to expect one document format for a particular document, for example one check format for a particular account, and the document format changes, the success rate for the particular document may decline significantly. Using the check example, if the check happens to be drawn on an account which typically issues a large number of checks, the overall system success-rate may also decline significantly because of the failed reads for the particular account. Thus, it is desirable to continually monitor the system sucess-rate and correlate the success-rate with the document formats.