Optical character recognition (OCR) is technology designed to facilitate recognition of characters and letters. OCR allows for the electronic conversion of images of handwritten, printed, or typed text into machine-encoded text. Since OCR relies so heavily on interpreting text, the fonts, characters, and their respective sizes used in a source document plays an integral role in the OCR process. These technologies commonly misinterpret characters or confuse one character for another, particularly when the technology needs to distinguish between characters that look similar. For example, an OCR engine may have difficulty distinguishing between the letter O from the number 0, or a lowercase “L” from an uppercase “I.”
Traditionally, OCR devices would read input from printers acting as a data processing device. These data processing devices were only able to analyze specific fonts provided by the printer. OCR fonts were eventually created to optimize the scanning process from different devices. OCR-A was a sans-serif font standardized by the American National Standards Institute (ANSI) that initially used a fixed width monospaced font for printers to use. Though OCR-A was meant to be easy to read for machines, the font was hard for the human eye to read. OCR-B was another monospace sans-serif font created to facilitate OCR for specific electronic devices, originally used for financial and bank-oriented uses. OCR-B is slightly easier for the human eye to read.
However, the use of these fonts still depended on a variety of factors for an optimal scan, such as font, size, color, contrast, brightness, density of content, text placement, and font spacing. Dots per inch (DPI) may also be another factor to consider with respect to character placement.
Intelligent character recognition (ICR) is often used as a recognition system that enables a computer to recognize machine print or handwritten characters. ICR is often considered to be a more advanced OCR capability. Some ICR software may include a machine learning system that can identify handwriting patterns. Form design may influence the accuracy capabilities of ICR systems, making recognition easier on some forms. For example, boxes are often used on forms to constrain handwriting, encouraging uniform sizing and separation of characters.
Whether it is machine print or handwriting, recognition methods today are achieving lower than desired results. Current recognition systems may work well in very limited scenarios. The variations observed in real world documents are not well handled by existing OCR/ICR systems. If the recognition process has difficulty distinguishing or knowing the original fonts or characters, the end product may not reflect what was in the original source. Ideally, incoming documents would be structured in a manner that is optimized for the recognition system.