The printing of characters for subsequent optical character recognition (OCR) falls into two major categories which depend on the type of OCR technique which may be used. The first OCR technique is geometric OCR. The second OCR technique is color coded OCR.
Geometric OCR attempts to recognize a character based on the character's shape or the geometric representation of a set of pixels or dots. A character as used herein is meant to include a printed or written symbol which can be recognized by an OCR device or a human reader. The character can be an alphabetical symbol or an icon. Furthermore, the term pixel and dot will be used interchangeably to describe a distinguishable point recognizable by an OCR device. In such a geometric OCR approach, color is used only to define the shape of a character. Even if characters are represented by multiple colors, the multiple colors are converted to either black or a gray scale before shape analysis. Thus, the printing of characters for subsequent geometric OCR is primarily dependent on the shape of such characters as recognized by an OCR device or as perceived by a human reader.
Such a geometric OCR approach can provide a recognition accuracy as high as 99.5%. However, higher degrees of accuracy are desired. In addition, significant data storage is required for each character shape to be recognized. This means that a geometric representation of the shape of each character of the alphabet plus the other symbols to be recognized has to be stored. This data storage is redundantly duplicated for each character font supported. This means that not just one representation of the geometric shape is stored for the character "a", but that representations of the geometric shape for Prestige, Elite, Gothic, Roman, etc. versions of the character "a" are stored. Furthermore, computer processing time is required to compare to all of the stored shapes. Again from a redundancy standpoint, redundant computer processing time is required to compare to multiple fonts.
The second technique of color coded OCR attempts to recognize a character based on the character's color. In such an approach, color is used not only to indicate the shape of a character, but also to indicate the identity of the character. For example, "a" is printed red, "b" is printed blue, and "c" is printed yellow. Thus, the printing of characters for subsequent color coded OCR is dependent on both the shape of such characters as perceived by a human reader, and the color of such characters as recognized by an OCR device.
Color coding OCR eliminates the data storage and computer processing requirements of geometric OCR by eliminating shape processing. Color coding OCR also can provide higher recognition accuracy rates than those of geometric OCR as it is not subject to shape processing errors. However, prior art color coded characters cause a severe visual distraction to a human reader because such prior art color coding is distinguishable to a human reader. In addition, special OCR printing apparatus and special OCR reading apparatus are used for prior art color coded characters.
The two prior art approaches of representing characters for subsequent OCR processing present four major difficulties. The first two difficulties are the substantial storage and substantial computer processing required by geometric OCR shape processing. If these shape processing difficulties are avoided by using prior art color coded OCR, then the third difficulty is the severe visual distraction of prior art color coded characters to a human reader. A fourth difficulty is the special printing and reading devices used by prior art color coded OCR.
Thus, there is a need for an approach which can substantially increase the accuracy rate of optical character recognition techniques while overcoming the deficiencies of both prior art approaches.