Optical character recognition (referred to herein as OCR) is a useful feature that allows a computing device to recognize text (more particularly, characters thereof) in an image and convert the text of the image into machine-operable text, e.g., ASCII characters. The machine-operable text is considered an “OCR decoding result”. “Machine-operable text” includes text characters that can be processed in a computer, usually as bytes. For example, users can download, photograph, or scan books, documents, product labels, etc. to obtain an image including text. The users can perform OCR on the image so as to recognize the text in the image, thereby allowing a user on his/her computer, mobile phone, tablet, etc. to select, copy, search, and edit the text.
Conventional OCR systems, however, frequently produce OCR errors (referred to as “misreads”) when recognizing and decoding text. Common errors include unrecognizable or improperly converted text, e.g., an “O” (letter O) for a “0” (number zero), or an “E” for a “B”. OCR errors can often render converted text unusable until a user corrects the errors. Improperly converted text can occur, for instance, when an image has a low resolution, blurred text, and/or unclear text. In another instance, conventional OCR systems may improperly convert text because the image may include uncommon characters or an underlying adjacent graphic that obscures the text. Furthermore, OCR systems can recognize illustrations in an image as text when the illustration does not actually include text. Generally speaking, Optical Character Recognition (OCR) has conventionally had an unacceptable misread rate.
In barcode scanning, a voting methodology is used to compare the decoded data string to subsequent decodes and when a sufficient number of identical scans occur, the decode is presumed to be valid and passed onto the host or application. Misreads are rare in barcode scanning and customers expect the same from OCR scanning. Unfortunately, the voting methodology does not work well in optical character recognition due to the frequency of OCR misreads—it may take a lot of scanning before a sufficient number of identical scans occur without a misread, if at all.
With an expected decoding result, a standard computer program may be conventionally used to identify a misread OCR decoding result by identifying a misread character(s) therein. For example, a plurality of OCR decoding results are depicted below, with multiple misreads detected by the standard computer program (for example, in the first line, “THEEASYDOG” should be “THELAZYDOG”):
OCR Line 3: P<UTOTHEEAZYDOG<<GIUICK<BROWN<FOX<JUMPS<OVER<MISREAD
OCR Line 3: P<UTOTHELAZYDOG<<QUICK<BROWN<FOX<JUMPS<OVER<MISREAD
OCR Line 3: P<UTOTHELAZYDOG<<QUICK<BROWN<FOX<JUMPS<OVER<OK
OCR Line 3: P<OTOTHELAZYDOG<<QUICK<BROWN<FOX<JUMPS<OVER<MISREAD
OCR Line 3: P<UTOTHELAZYDOG<<QUICK<BROWN<FOX<JUMPS<OVER<MISREAD
OCR Line 3: P<UTOTHELAZYDOG<<QUICK<BROWN<FOX<JUMPS<OVER<MISREAD
OCR Line 3: P<UTOTHBLAZYDOG<<QUICK<EROWN<FOX<JUMPS<OVER<MISREAD
OCR Line 3: P<UTOTHELAZYDOG<<QUICK<BROWN<FOX<JUMPS<OVER<MISREAD
OCR Line 3: P<UTOTHELAZYDOG<<QUICK<BROWN<FOX<JUMPS<OVER<MISREAD
OCR Line 3: P<UTOTHELAZYDOG<<QUICK<BROWN<FOX<JUMPS<OVER<MISREAD 
However, in the usual application in the field in which the decoding result is unknown in advance, the standard computer program is not useable for detecting a misread, and therefore for determining when a valid OCR decoding result has been obtained.
Therefore, a need exists for methods for optical character recognition (OCR). Various embodiments provide a presumptively valid OCR decoding result with a high level of confidence even if every OCR decoding result contains a misread.