1. Technical Field
The present invention relates generally to document processing, and more particularly, to background data recording and use with document processing.
2. Related Art
Despite the evolution of electronic communications, the requirement of formalized documents as a communications medium remains in many industries. The content and layout of documents vary according to industry. For example, documents may include: correspondence, checks, orders, invoices, receipts, filled-out forms (e.g., insurance applications and completed tests), securities, etc. Processing of documents, however, has progressed such that many documents have a digital life in addition to a physical printed existence. In industries where a large number of documents are necessary, document processing management becomes very important. Document processing management can normally be broken into three stages: front-end generation of the document, usage of the document, and back-end processing of the used document. The substance of each stage may vary according to industry.
One problem with conventional approaches to document processing management is that front-end processing data is not used with back-end processing data. This may be the case even when the front-end document generating data exists in the same organization as the back-end processing. More often, however, the problem exists because the front-end and back-end processes do not exist in the same organization. For example, in the banking and finance industry, checks can be issued by a large number of institutions and cashed by an equally large and independent number of institutions. For the clearing of checks, banking institutions often overnight express CD-ROMs of the check images to their large commercial customers. Some institutions manually compare the checks to their text data. In this case, unless the cashing bank happened to have written the check, it is highly unlikely to have access to the front-end data used to print the document for detecting errors. There is no current service that generates checks and leverages the original data usable for printing to ensure the accuracy of the checks cashed by comparing each cashed check to the original check data.
One particular back-end process is archiving documents. The banking and finance industry is also a good example of where document archiving has a significant role. In this industry, important data such as customer statements or check images are usually archived. For example, millions of checks are processed every day in the United States. After being cleared, each check must be archived for seven years.
One persistent problem with back-end document processing, such as archiving, is obtaining an accurate image of a used document. Images for processing are conventionally made by scanning the document. For example, the front and back of checks are conventionally scanned and compressed. In many cases, both sides of the check are stored as grayscale images, which allows presenting a “reasonably accurate” image of the original. The front side of the check is also frequently stored as a “bilevel” image, i.e., a black and white image, which shows any handwritten text and any background data that is dark enough to register as black. “Background data” may include background images and layout matter such as text and/or layout objects that are provided on a document to provide the document's look-and-feel. In terms of a check, layout matter may include, for example, bank name, terms such as “Date” and “Pay to the Order of,” routing number, account number, entry lines and boxes, etc. Archiving is especially problematic relative to inaccurate imaging because many used documents are destroyed after imaging.
Obtaining an accurate used document image has become increasingly difficult for many reasons. One reason is poor image quality. Another reason is the reduction in brightness disparities in documents created by backgrounds and lighter shade inks. For example, background images used on checks are becoming increasingly active, i.e., they contain more matter that registers as black in a bilevel image or a significantly dark object in a grayscale image. Accordingly, a background image may improperly register as significant information. Given the extremely wide variety of the possible document background images and ways that many documents can be preprinted and then filled out by hand, there is no reliable way of separating significant information from the relatively unimportant information. As a result, documents are processed as if everything on the document is important. This results in larger compressed image sizes, and increased storage and communication costs. An additional problem with background data is that it may also be hard to compress efficiently.
Relative to lighter shade inks, such as those available in gel pens, imaging may result in loss of significant information. For example, significant information written in a light shade ink may register as white in a bilevel image or a significantly light object in a grayscale image. For example, with regard to a check, the payee and amount may be handwritten with a light shade ink that makes the text un-discernable relative to the background data when scanned. This often results in significant information being lost. Again, this is especially problematic relative to archiving because many used printed documents are destroyed after imaging.
The inaccuracies described above could be remedied by knowing background data when conducting back-end processing. Unfortunately, no mechanisms exist for determining, recording and tracking background data, or for sharing this information with the back-end processing stage. Currently, activities such as ‘form dropout’ and ‘form removal’ are possible. In form dropout, the form (layout matter) is printed in a different ink than the rest of the document, and does not show up in the scanned document. In form removal, the form is scanned and stored as part of background processing and used to remove the form from the scanned document. In neither of these cases, however, is background data recorded at document generation. As a result, a document's background data is not known at the back-end processing stage, and documents are generally processed by ignoring the above-described imaging inaccuracies.
In view of the foregoing, there is a need in the art for background data recording during front-end document generation for use with back-end document processing