This invention relates to optical character recognition and, more particularly, to a technique by which an alphanumeric character is recognized by the geographical features (e.g. bays and lagoons) which comprise that character.
Optical character recognition (OCR) devices have long been used to sense and interpret alphanumeric characters that, typically, are provided on a printed page. Usually, however, such devices are limited to the extent that characters are recognizable only if they are printed in one (or, at best, a limited few) predetermined font. Printed characters which, nevertheless, may be clear and formed uniformly as, for example, by a typewriter or other printing machine, will not be recognized if they are formed in some other font.
Another operating limitation of conventional optical character recognition devices relates to what is known as "line finding" and "segmentation". A line finding operation is carried out by many conventional OCR devices to locate the lines of characters that are printed on a page. This distinguishes character lines from the spaces between lines and usually is implemented by detecting the distribution of pixel information in the horizontal direction. A greater concentration of black pixels represents a line of characters and a low concentration represents a space between lines.
A segmentation operation is intended to locate the spacing between characters in a line of characters. This isolates a block of pixel data, which then is presumed to be limited solely to a character to be identified, whereafter the data block is examined and recognized. Typical segmentation operations are successful if the characters in a line are spaced uniformly and are printed in, for example, Roman-type font Characters that are printed at angles, such as italics, often cannot be isolated. Rather, the segmentation operation senses portions of adjacent characters, presumes that such portions comprise a single character, and then attempts, unsuccessfully to identify a "character" formed of such portions Similarly, typical segmentation operations often cannot separate (or "segment") characters that are smudged or blurred because of the lack of a well-defined space between such characters. Thus, a block of data representing a character to be identified cannot be formed. Likewise, a "break" that might be present in a character may be interpreted erroneously as a space between adjacent characters, resulting in two separate blocks representing partial characters rather than a single block representing a whole (albeit broken) character.
It is believed that the aforementioned disadvantages of conventional optical character recognition devices are attributed primarily to the fact that, in most such devices, character segmentation (or separation) might not be successful, thus impeding the comparison of a properly scanned character to a reference, or standard geometric form of that character. Significant deviations between the scanned and reference characters, such as differences in font, misalignment of the scanned character, apparent "connections" between independent characters, or "breaks" in a single character, largely due to improper character segmentation, prevent accurate identification. While various comparison techniques have been proposed heretofore, most optical character recognition methods rely upon a "template" comparison of scanned characters in order to identify those characters.