The art and science of scanning has never been more important. The vastly improved bandwidth infrastructure associated with the Internet and the exponentially improving cost effectiveness for digital storage has compelled industry and governments to scan analog objects into digital form to improve maintenance costs and to dramatically broaden the availability of these documents using the Internet.
This surge in the digitization of documents is exemplified by Google Inc.'s 14 Dec. 2005 announcement that it is working with the libraries of Harvard, Stanford, the University of Michigan, and the University of Oxford as well as The New York Public Library to digitally scan books from their collections so that users worldwide can search them in Google.
Also, the Check Clearing for the 21st Century Act, which was signed into law and became effective on Oct. 28, 2004. This act is referred to as “Check 21” and is designed to foster innovation in the bank payments system and to enhance its efficiency by reducing some of the legal impediments to eliminating paper checks. The law facilitates paper check truncation (allowing banks to destroy paper checks once scanned) by creating a new negotiable instrument called a substitute check, which permits banks to minimize costs due to maintaining original checks, to process check information electronically, and to deliver substitute checks to banks that want to continue receiving paper checks. A substitute check is the legal equivalent of the original check and includes all the information contained on the original check.
One increasingly important area in the field of scanning is in providing security to the documents that have been scanned. For example, the Canadian government recently began a program to provide public Internet access to its heritage or historical repositories. Fears of the possible tampering with the government data compelled the Canadian Parliament to require that these organizations make “reasonable” attempts to ensure the integrity of the documents.
More broadly, increased awareness of security and privacy issues is resulting in national and international legislation on privacy and digital signatures. Examples of such legislation in the United States alone include the Electronic Signatures in Global and National Commerce Act (E-Sign), the Uniform Electronic Transactions Act (UETA), the Health Insurance Portability and Accountability Act (HIPAA), Gramm-Leach-Bliley (GLB) Financial Services Act, and the Government Paperwork Elimination Act (GPEA).
Once a document is scanned it is possible to change the contents of the scan by using an image editor, text editor (if converted to text) or by other means. To prevent this from happening, a digital signature of the document can be calculated and attached to the file header or filed into a database.
Digital signatures start by using the concept of a hash. A hash is distilled representation of a relatively large data record into a shorter reference value. What is desirable is to create the key from the data, but with a negligible likelihood of the same key being generated from two distinct records. Methods that perform this distillation are referred to as hash algorithms and they are used widely in computer systems. A digital signature is an encrypted version of the hash, typically using a public-key infrastructure (PKI) algorithm.
In Merkle et al., U.S. Pat. No. 5,157,726, issued Oct. 20, 1992, titled “Document copy authentication”, a process for making an authenticatable copy of an original document supplied by an entity is disclosed. In this patent, a hard copy is made of an original document that incorporates on the document a digital signature representing the document contents and the identity of the said entity. This technique allows the source to be encoded in the hardocopy.
In U.S. Pat. No. 5,912,972, Barton, issued Jun. 15, 1999, titled “Method and apparatus for embedding authentication information within digital data”, arbitrary digital information is embedded within a stream of digital data, in a way that avoids detection by a casual observer and that allows a user to determine whether the digital data have been modified from their intended form. The embedded information may only be extracted as authorized and may be used to verify that the original digital data stream has not been modified. This technique allows authentication data to be distributed in the actual digital data associated with the document.
While these approaches are extremely useful and reliable, they do not answer the issue of the scanned quality. If the scanner was out of calibration or otherwise not working correctly, the digitized document could be meaningless, erroneous or artifact laden, rendering an authenticity technology such as described above meaningless because the data they attempt to protect may be useless.
Knowing that the data that was scanned is a satisfactory replica of the original is clearly extremely important. Companies whose job it is to perform the scanning of important documents for governments, financial institutions and other concerns may become liable for loss of potentially priceless information should these scanners not be working correctly.
Under the current art, it is possible to have a trusted human being review the freshly scanned document for integrity and optimal image quality before submitting the document to a secure hashing algorithm. However, given the fact that typically a tremendous number of documents must be scanned, a human based quality control solution is not economically viable. Compounding this, human error rates may be significant and beyond the threshold of customer tolerance.