1. Field
The present invention relates generally to software or systems utilizing optical character recognition and improvements thereof. More particularly, the present invention relates to software or systems for determining or correcting errors in electronic files generated using optical character recognition and improvements thereof.
2. Description of the Related Art
As society becomes increasingly computerized in nature and as physical storage space for many businesses is increasingly filled to capacity, there has been an amplified effort for many industries to generate and store electronic copies of previously created hardcopy documents. Such electronic copies permit far cheaper and easier backup and management than their physical counterparts and electronic files exhibit greatly reduced risk of damage or loss over time. What might have once taken up entire storage facilities or warehouses for document retention purposes may now be easily stored on a few compact hard disk drives at a fraction of the expense and physical storage space required. In addition, electronic documents also allow for much easier transmittal or reproduction of the documents, allowing for improved remote access to the files over private or public networks. Moreover, categorizing, editing, computing, manipulating, and retrieval of such documents can searched comparably quicker and easier via electronic copies.
Optical character recognition (“OCR”) has become a popular process for the conversion of scanned paper documents having handwritten, typewritten or printed text into electronic files since it not only provides for a readable electronic copy (i.e., an image) of the paper copy, but also attempts to translate the text of the paper copy into a machine-readable format. Thus, instead of an electronic copy acting only as an image interpretable by a human eye, machine-encoded text can be searched or otherwise manipulated or computed upon electronically. A human being may no longer be necessary to read or otherwise interpret an electronic document for determining its contents or for searching particularly desired features; rather, a computer can be used to perform the same tasks at a much quicker and more efficient rate. These features have made OCR a widely used form of data entry in recent times.
Unfortunately, OCR can be unreliable when attempting to decipher handwriting, fonts, or degraded documents or printing that is not easily identifiable. This is particularly problematic when documents contain numerals or other information that OCR processes cannot readily determine based upon context of other, surrounding wording. For example, OCR processes performed upon tax forms or other financial documentation or statements run a substantial risk of misinterpretation due their almost entirely numerical nature. Even a single error in the determination of a number can result in vastly different financial information. Thus, although OCR is implemented to help save time in searching or retaining documents, significant human manpower is conventionally employed in order to crosscheck and verify the accuracy of documents that undergo the OCR process.
Currently, a variety of solutions have been proposed for aiding in accurate OCR capture. One such process utilizes multiple passes of a document through different OCR technologies or employs human operators to determine if there exists any variance between the multiple interpretations. Another process involves creating relationships between data entries during the OCR process across a wide number of electronic documents and establishing confidence levels in subsequent OCR accuracy based upon these generated relationships. These techniques require substantial time and/or a large plurality of prior OCR'ed documents in order to effectively operate.
Ideally, a system or method could be used to electronically verify the accuracy of the OCR process for electronic documentation. The system or method would ideally operate automatically or with a minimum of human intervention in order to minimize employee expenses and human errors. The system or method would ideally be able to operate to a high degree of certainty in verifying such documents and be capable of operating upon current documentation without requiring comparison to previous corresponding documentation for confidence in OCR accuracy. In addition, the system or method would ideally be able to verify OCR errors in documents that are particularly error prone in the OCR process, such as financial statements or tax forms.