1. Field of the Invention
The present invention relates generally to document imaging and processing and more particularly to systems and methods for digitizing documents and storing and accessing the same.
2. Background of the Invention
Even with the widespread use of computers in business and in daily life, the use of paper-based documents to record, communicate and store information remains exceedingly popular. Although software applications offer new and improved functions such as character recognition, managed document archival and retrieval and specialized image processing, many businesses can not leverage these capabilities because they maintain a significant amount of information in paper form rather than electronically.
Various other drawbacks are associated with business processes that involve storing large amounts of information in paper form as opposed to maintaining such information electronically. For example, pages can easily be lost or misplaced, large physical spaces may be required for storing the documents, and information may not be readily accessed through search applications which are available for electronically stored information.
In some contexts, even though information was originally created and stored using paper documents, conversion to electronic format via digitization is required for one or more reasons. For example, in the case of litigation, it is often necessary to store, access, produce and analyze a large number of documents associated with the particular dispute. In most cases, the overall business process associated with converting physical documents in various formats into digital form is error prone, costly and time-consuming.
One problem associated with the digitization of images, particularly in connection with jobs with large numbers of documents, is the manner in which images of poor quality are handled. By way of example, for a job wherein it is necessary to image one million documents or more, it is extremely likely that at least some of the original documents to be scanned will be of poor quality. It would be exceeding arduous if not impossible to require a worker to go through each document in the job to identify original images of poor quality and handle them on a case by case basis.
In fact, aside from the large amount of time necessary to do this, difficulties arise in terms of defining what criteria are to be used to classify an original image as of “poor quality” requiring a particular treatment. Even so, once an image is classified as being of poor quality, the desired treatment of that original image may vary depending on a number of factors including the content of the document, the extent to which image deficiencies exist, the physical location of the image within the overall job (e.g. which box it is in) as well as other factors. In summary, manually classifying and handling documents of poor image quality is a task that is difficult, if not impossible in connection with digitization jobs of any size other than very small jobs in terms of number of documents to be scanned.
The inability to classify and handle original documents of poor image quality can lead to a number of undesirable results associated with the digitization process. For example, if original documents which are of poor image quality are not identified as such during the scanning and archiving process, problems can occur at a later time. When a digitized document is later retrieved and it is unreadable or otherwise suffers from poor image quality, it will be unknown whether the poor image quality is due to an error or defect in the scanning and/or archiving process or whether the poor image quality is due to the original image being of poor quality.
As such, in order to deal with such a situation it is ordinarily necessary to go back and retrieve the physical original image (assuming it still exists and is available) to determine whether the original image is of poor quality or not. If it is, all of the effort necessary to retrieve the document will have been wasted since the digitized copy of the image will remain the best available image.
In addition to the problem of dealing with poor quality and/or unreadable digitized images, various aspects of the overall digitization task further complicate the process. For example, “paper-based” documents really represent many forms of physically stored information. This includes formats such as paper, microfilm and microfiche as well as other formats. Each of these formats generally requires its own, separate scanning device. Because of this, boxes of documents must be separated and fed into different scanning devices thus giving rise to the possibility that documents could be misplaced and/or the original document ordering could be lost.
Difficulties in maintaining document integrity and the original ordering also arise during other steps in the digitization business process. Boxes of documents and/or individual documents may be lost or caused to be out of order during pickup and/or transportation from the place where the documents are stored to the place where the documents are to be scanned. With typical digitization business processes, documents can also be lost or caused to be out of order during the time they reside at the scanning location and/or during the scanning process itself.
The lack of document integrity and the presence of poor quality images in connection with a digitization process is of even more vital concern in the case where the source documents are destroyed following imaging. Often times, imaging is performed for the primary purpose of consolidating space and physical storage requirements. In this case, documents are typically destroyed or, at least, stored off-site in a relatively inaccessible form following digitization. In this case, electronic document integrity is even more critical since the source documents no longer exist or are difficult to retrieve.