Optical character recognition (OCR) techniques allow for automatic recognition of text in scanned documents and images. Specifically, a computer system implementing OCR-based tools can detect and identify characters in images, and generate words or text using the identified text or words. While the accuracy of OCR-based tools improved significantly over the years, such tools or techniques still suffer various types of text recognition errors. These recognition errors are usually fixed manually by humans revising output text provided by OCR-based tools. However some types of errors can become more frequent and more significant in OCR extracted data associated with scanned (or imaged) documents having, for example, relatively poor image quality, relatively small text characters, text miss-orientation, or a combination thereof. Also, the accuracy of the OCR-based tools can vary based on the relative positioning of words and expressions in the scanned (or imaged) documents.