Optical recognition of characters, such as letters, numerals and punctuation marks, in a text that is being scanned by a document scanner generally relies upon decomposition of a character into a set of small regions that comprise the character. Each of these regions is then analyzed in order to distinguish one character from all other characters. Analysis of these regions of a character often requires measurement of the relative widths of dark and light spaces that are part of the character when, say, the character is represented as a dark or blackened mark on a light background. This approach requires that absolute or relative measurements be made of the widths of the dark and light spaces and that these widths be numerically analyzed. Such an approach is time consuming and requires use of complex algorithms for such analysis.
A correlation approach is most commonly used, wherein the designated character to be recognized is overlaid on each of a set of candidate characters of similar size. The correlation function of the two characters is numerically computed, and the candidate character with the highest correlation number is then chosen as the designated character. This approach is also time consuming and requires many performances of a plurality of numerical computations for each designated character that is examined.
An example of the correlation function approach is U.S. Pat. No. 3,644,889, issued to Skenderoff et al. The patent discloses a character recognition system for alphanumeric characters using a minimum of three vertical slices and an unspecified number of horizontal slices through a character. A correlation function is constructed, using line segments and curves from the character being scanned and corresponding characteristic functions for each known character in the candidate set of characters. The Skenderoff et al. disclosure also distinguishes between families of ambiguous characters that cannot otherwise be distinguished through normal processing.
Other approaches to character recognition and identification do not use correlation functions. An example is U.S. Pat. No. 4,675,909, issued to Egami et al. The patent discloses an optical character recognition circuit that uses a number of linear slices through a character and determination of intersection of each such slice with a portion of the line segments and curves that make up the character being examined. The Egami et al. invention, as disclosed, appears to use four or more slices through a character and does not provide any post processing to remove any remaining ambiguities between characters.
Three related U.S. Patents, U.S. Pat. No. 3,217,294, issued to Gerlach et al., U.S. Pat. No. 3,270,319, issued to Schmid, and U.S. Pat. No. 3,434,110, issued to Bucklin et al., disclose the use of a character recognition system that relies upon five vertical slices through a character and two horizontal lines through a character. This produces a total of ten intersection regions at which each portion of the character is examined. The character set itself is preferably a stylized character set provided by the inventors rather than a character set in an arbitrary font. In the '294 patent, a five-bit sequence is generated for each of the two horizontal lines that intersect the five vertical spaces. These two sequences are compared with sequences for known characters in a logic matrix or circuit to attempt to identify the character that has been scanned. The '294, '319 and '110 patents also disclose use of error detection means and provide for rescanning a character if an error is detected, manifested receipt of two five-bit sequences of logical zeros and ones that do not match any character identified in the logic's matrix.
The disclosures of Skenderoff et al., Egami et al., Gerlach et al., Schmid and Bucklin et al. all appear to require information on the width or thickness of intersection of a given slice with a portion of the character under examination.
A supplementary scanner for character recognition is disclosed by Hardin et al. in U.S. Pat. No. 3,585,588. The supplementary scanner is activated after a conventional character scan has been performed, and only if the conventional scan has produced an ambiguity between two or more possible characters. The supplementary scan appears to require five additional scans of the ambiguous character. Two of the scans are vertical. One of the scans is horizontal. The remaining two scans are oriented at non-zero angles relative to the directions of vertical and horizontal scan. The numbers of intersections of two or more of these five scan lines with the ambiguous character are counted and used together with the conventional scan information to attempt to identify a character.
What is needed is a simple method for optical character recognition that can be performed in relatively few steps and that does not require measurement of absolute or relative widths of the dark and light spaces that characterize different portions of a character.