Data capture systems which utilize character recognition to facilitate capturing of data from text based documents have existed for many years. In more sophisticated systems, characters recognized from a document image may be populated into fields of a database or fields of a data file.
On challenge with such data capture systems is that character recognition technology remains prone to error. The errors are further exacerbated if the document image on which the character recognition is based is an image generated by scanning of a paper document which includes any of low contrast, ink bleed, or other characteristics which affect the shape, alignment, and/or spacing of characters as present in the document image.
Contemporary data capture systems which are used for generating data fields with specific data elements may employ exception handling to correct errors within character recognition processes. In more detail, the data capture system initially identifies each data field value using characters provided by the character recognition system. Validation rules are then applied to the data field values to identify characters that may have been mis-recognized by the character recognition system. For example, if the data field value is expected to be a numerical value, a validation rule may consist of verifying that the data field value is numerical. A roman alphabet character within the data field value would be character likely mis-recognized.
Exception handling systems provide for display of the document image on a monitor for a human recognition and keyboard input of those characters that are likely mis-recognized. To facilitate the human operations, more sophisticated exception handling systems may sequentially highlight the mis-recognized characters within the document for human recognition and keyboard input.
After the exception characters are returned, the data capture system may substitute the human recognized characters for the characters mis-recognized by the character recognition system.
To further improve accuracy, the exception handling systems may provide for display of the same document image on a monitor of a second and independent exception handling processor for a second and independent input of those characters that are likely mis-recognized.
A problem with existing document capture systems that use existing exception handling techniques is that if the application is such that the document image includes confidential information, extensive security measures must be implement to adequately protect the confidential information when transmitted to the exception handling processor and displayed for exception handling.
Examples of confidential information within a document image include social security or tax numbers, credit card account numbers, bank account numbers, financial information, protected health information, or other identifiable confidential information. If such information is included, the hardware, software, and network systems used for providing the document image to the human operators must be secured.
The facilities at which human operators have access to the monitors displaying the document images must be secured. The human operators must be bound by, and trained to comply with, appropriate policies and procedures related to such confidential information. And, extensive resources are typically utilized to monitor and audit the security of the systems, policies, and procedures to achieve confidence that the confidential information remains protected.
What is needed is a secure document data capture system which provides the accuracy of a system that includes exception handling in a manner wherein the confidential information is secured in a manner that does not require the extensive resources needed for implementing the systems, policies and procedures to secure confidential information when traditional exception handling systems are employed.