Optical Character Recognition (OCR) is a useful technique for processing business forms. Machine reading systems can replace several data-entry operators and reduce the expense of data capture.
In general, the first step of the OCR process is electronic scanning of the document and converting all of the information to a digital bit-map. Once the image is captured in an electronic format, the information to be read is separated from the background information--boxes and guide text must be ignored and the filled-out text should be read. Once this separation is accomplished, the electronic image of the text is processed by the OCR algorithm, where the characters of interest are converted to ASCII data.
Almost all OCR systems processing business forms employ the technique of a "drop-out color". By printing documents in a predetermined color (usually a pastel color) and employing an optical filter of the same color in the electronic scanner, the filled-out text on the document can be separated from the printed form. The color filter causes the scanner to ignore information printed in that color (to the electronic scanner, the form color appears as being equivalent to the white background of the paper). However, since the filled-out text typically is typed or printed in black (or other dark color), this information is captured by the scanner as black. Hence, the pre-printed form is converted to a white background and the filled-out text can be processed readily by an OCR algorithm.
Use of the optical filter works well in this application, but it limits the customer to using a very specific color on the form (one that precisely matches the characteristics of the optical filter installed in the scanner). Additional drop-out colors can be included in the scanner by adding additional optical filters. Accordingly, the processing of a particular form would require selecting the proper optical filter and mechanically inserting it prior to processing the form.
However, slight variations in the printing process or changing form vendors can produce variability in the actual color of the printed form, thereby reducing the "drop-out" effect. Such changes can cause noise to be added (the scanner sees the pre-printed form information as black instead of white) which may result in the OCR algorithm producing erroneous results. Alternatively, the changing of optical filters to accommodate these slight variations in printing is not practical, since this would require a large inventory of filters, each with slightly different characteristics. Therefore, at present, the only way to control this problem practically is to tightly control the printing process to insure a uniform drop-out color. As a result, OCR Form Reading systems presently in use are generally "closed loop", which means the Forms Processing Firm (such as an insurance carrier) must maintain control over the printing of the forms, because forms created by outside establishments may not read properly due to color variations.
The present invention eliminates the need for mechanical filter insertion and the drop-out color problem can be eliminated by use of programmable filters in the electronic scanner. Use of the present invention would allow the scanner to intelligently select the correct drop-out color based on the actual form being processed.