Most alphabetic characters have highly differing misread propensities based upon the alphabetic upper or lower case in which they are printed on the document scanned by an OCR. This is readily evident from an examination of the significantly different geometry in most upper and lower case characters; for example, A, a; E, e; G, g; and so on. It has been discovered that the overall OCR post-processing error correction function is enhanced by a preprocessing step within that function to determine the upper or lower case print convention in which the alphabetic characters within a word were inscribed on the document scanned by the OCR. This preprocessing step enhances the accuracy and reliability of the total overall OCR post-processing error correction function.
The utility of the subject invention can be seen in its preprocessing role with respect to the error correction apparatus disclosed and claimed in the copending patent application entitled "Regional Context Maximum Likelihood OCR Error Correction Apparatus," Ser. No. 600,743, which was filed on July 30, 1975 as a continuation-in-part to Ser. No. 459,820, which was filed on Apr. 10, 1974. The system disclosed therein selects the correct form of a garbled input word misread by an OCR so as to change the number of characters in the word by character splitting or concatenation. Dictionary words are stored in the system. The vastly different alphabetic character confusion propensities dependent upon whether a given character inscription would be in upper or lower case print convention, emphasizes the important utility of the subject invention in its determination as to whether an alphabetic character field (i.e., word) output from an OCR is related to the scan of an upper case or a lower case field inscription on the document scanned.