Optical character recognition (OCR) systems are typically used to capture text from a document (e.g., a machine-printed document, handwritten document, etc.) by optically scanning the document and creating a two-dimensional digital representation of the document (e.g., a pixel representation, a bit-map, etc.). Most OCR systems are configured to convert the two-dimensional digital representation into a series of characters that may be manipulated by a computer. For example, OCR systems typically convert the text portions into code, such as code formatted according to the American Standard Code for Information Interchange (ASCII) or the Unicode standard, by performing any of a variety of character recognition processes on the two-dimensional digital representation. Many OCR systems are configured to provide the character-encoded representation in a form compatible with common software applications, such as word processing, etc. OCR systems perform a variety of heuristic and template-driven tasks on the two-dimensional digital representation to produce the character-encoded representation, which may be imported into another software application to be displayed, printed, and/or modified.
The accuracy of the output of current OCR systems, however, may be very limited. For example, because of the similarity between individual symbols, characters, and/or other annotations, as well as combinations of characters (glyphs), current OCR systems may be unable to eliminate errors that occur in the process of recognizing characters in the two-dimensional digital representation of the document. Typically, OCR systems have a tendency to produce glyph (one or more characters) mistakes, substitutions, insertions, deletions, etc. In addition, the output of current OCR systems is highly-dependent on the quality of the original document and the quality of the two-dimensional digital representation that has been scanned.
Furthermore, in order to reduce character recognition errors, current OCR systems may have to implement more complex character recognition techniques. Complex character recognition techniques, however, are more expensive and require more processing time. For instance, a current OCR system may implement a very complex character recognition technique that is designed to produce a very accurate output. However, these systems may be problematic where large amounts of text are being converted because of the increased processing time required.