1. Field of the Invention
The present invention relates to the field of optical character recognition equipment, and particularly to the field of optical character recognition equipment capable of scanning and identifying characters of standard type fonts.
2. Prior Art
As the need for data inputs to data processing systems increases in volume and commercial significance, a variety of data input systems has been devised to most efficiently encode source data into machine-readable data. In addition to such standard entry methods such as key-punch, key-to-tape, key-to-disk, and remote on-line terminals, direct source data entry into a data processing system can also be accomplished by optical scanners, in particular, optical character recognition (OCR) systems. Optical character recognition equipment can dramatically reduce the labor costs of encoding data by eliminating the keyboard operation entirely. OCR equipment is potentially the fastest and most error-free method of data conversion at a lower overall cost than any of the more traditional key-punch or the newer key entry devices.
Whereas the productivity of key entry methods is limited by human skill levels which can only be improved within very finite limits, OCR equipment has a potentially unlimited character entry rate. While the costs of manual labor required for keyboard entry operations continue to increase, improvements in data entry hardware and in particular, in the processing hardware required for optical character recognition equipment continues to decrease in cost so that although initially more expensive at low volume levels, OCR equipment becomes a cost competitive technique as data volume increases.
Many OCR system readers are presently available for specialized user applications. However, a problem has previously existed as to the capability of OCR systems to recognize type-written characters from the more common standard office equipment type fonts.
Most commonly, character recognition equipment has relied upon the presence of a well-defined space between adjacent letters to provide a trigger signal for determining when a character has been identified and separated from other characters in a line.
To avoid the difficulty of recognizing character type fonts having some character pairs which cannot be easily separated or which do not readily fall into a simply recognized pattern, a number of specialized fonts have been devised particularly intended for optical character recognition use, which include all of the normal set of alphanumeric characters and special symbols found on the conventional typewriter keyboard. "OCR A" and "OCR B" are two of the more common of these specialized type fonts. The characters in both fonts are highly stylized since each character is designed to easily differentiate it completely from any other character.
Two objections to the stylized fonts are that they are not aesthetic and are not as easily read by the human eye. A more compelling objection is that their utilization requires the usage of equipment which is specially dedicated to preparation of documents having that particular type font.
The specialized OCR fonts are made to rigorously adhere to the requirement that they have no characters which kern or touch. Moreover, they are required to have a rigidly uniform character width.
A major obstacle to the recognition of non-specialized type fonts is the fact that they do contain characters of non-uniform width which do kern or touch in certain cases. This occurs rather frequently in most ordinary typewriter formats and in particular in such type fonts as Printing and Publishing 3 and Prestige Elite, both of which are in widespread use in office equipment and in publishing. Accurate recognition of individual characters requires that each character in a line of type be separated from adjacent characters so that elements of adjacent characters are not included within the character data field which is being evaluated. To the extent that characters in standard fonts do kern or touch, their recognition by OCR equipment is impaired. The difficulty of existing equipment in distinguishing and separating immediately adjacent characters which kern or touch has limited their widespread application.
Specifically, for those characters which touch, a vertical completely white column will never appear within the space occupied by two characters. For these characters which kern, i.e., overlap, but do not touch, a white space may exist between characters, but it will be non-vertical and may even be non-linear. Thus, a simple character row scan looking for vertical white columns between the adjacent letters will be futile. For equipment which is not designed to meet the contingency, erroneous character recognition results.