1. Technical Field
This disclosure generally relates to processing of documents, and more specifically relates to quality assurance of scanned documents.
2. Background Art
Computer systems have vastly improved the efficiency of many modern workers by providing ways to quickly and efficiently generate and handle electronic documents. Many software tools have been developed that generate and/or process electronic documents in various ways, including word processors, spreadsheets, databases, scanning software, web page development systems, content management systems, hypertext markup language (HTML), extensible markup language (XML), etc. It has long been the goal of many people in the information processing field to realize a “paperless office”, which means an office where physical paper documents are completely replaced with electronic documents. One impediment to realizing the goal of a paperless office is the great number of different types of documents that a typical business receives from outside sources that must be processed.
When a paper document is received by a business that is striving to realize the goal of a paperless office, the paper document is typically scanned into electronic form. However, in order for the document to be digitally filed in a structured filing system, the document must have indexing information added to the scanned document. Examples of indexing information include: document type, customer number, contract number, dollar amount, and other suitable metadata that describes the document. The process of manually entering indexing information for each scanned document has been a significant bottleneck in the realization of the goal of a paperless office. For each paper document that is scanned, a human operator must scan the document, then manually enter indexing information to allow the document processing systems to recognize, store and retrieve the new document. With a company that receives hundreds or thousands of paper documents each day, this requires a dedication of significant resources to scan the documents and enter the corresponding index information. Many companies prefer to do business by processing the papers instead of dedicating the resources to adapt their business systems to converting the papers to electronic documents, then processing the electronic documents.
Various systems have been developed to allow a user to more efficiently enter indexing information for a document. For example, U.S. Pat. Nos. 6,192,165 and 6,427,032 owned by ImageTag, Inc. disclose systems in which a user creates index information in a record in a database for a paper document before the document is scanned, places a label with a unique identifier on the paper document, then scans the paper document. The system detects the label with the unique identifier in the scanned image, locates the index record in the database that corresponds to the unique identifier, then stores the scanned document with the index record in the database.
Sometimes the scanned images of pages of a scanned document can have errors or can be incomplete. For example, two pages could have fed through a scanner document feeder at the same time, resulting in a missing page. Part of a page image may be cut off, making the page image incomplete. If the physical document is placed into the scanned document feeder with one of the pages upside down, the page will have an incorrect orientation. As a result, a scanned document can have errors as discussed above, as well as other errors.