Many documents carry informational markings and pre-printed character location markings visible to the eye which assist the document preparer in completing required information on the document. Many pre-printed forms utilize these pre-printed character location markings for confining characters to specific locations and sizes in order to assist a character reader in identifying the characters added to the document, but the markings themselves are not intended to be read by the optical character reader.
Various commercially available optical character recognition devices function well in identifying characters which are clearly separated from extraneous lines, dots, printed material, and other visible matter which may be pre-printed on the document and which are not intended to be read by the optical character recognition device. Such OCR devices do not experience great difficulty in identifying a single character or line of characters on an otherwise clear surface. Similarly, optical character recognition devices experience little difficulty in identifying all of the characters on an entire page, provided that there are no extraneous markings on the page and that the characters are properly registered. On some documents; however, it is essential that characters be printed on forms which are pre-printed in a manner such that writing areas are separated by visible lines or other marks. Such lines are necessary for separating the data in an orderly fashion. Many government forms such as income tax forms, census forms, social security forms, and others, have boxes within which to print information. It has been found that machine identification of hand-printed characters is assisted if visible constraint marks on the document are pre-printed to assist the preparer of the document. The desirability of the pre-printed character location markings can be balanced against the problems of optical character recognition devices in recognizing characters on documents containing such markings. The OCR devices must rely on light reflectance from the character-background and absorption from the characters themselves to distinguish between true characters which are to be identified by the OCR device, and other visible markings adjacent to the character, touching or passing through the characters or surrounding the characters which are not to be identified.
Document processing systems have been proposed in order to allow the OCR device to distinguish between the pre-printed character location markings and the actual characters to be read. One such system is described in U.S. Pat. No. 3,444,517, issued to J. Rabinow on May 13, 1969 and entitled "Optical Reading Machine and Specially Prepared Documents Therefor". This optical reading machine utilizes pre-printed documents in which the character location markings are printed in fluorescent material. The document is subjected to exciting radiation during the reading cycle such that the true characters will reflect very little of the radiations, but the marks will be energized in a manner to emit energy to which a scanner photocell is sensitive. Although these marks under ordinary light appear to be visible, when energized, in cooperation with their radiation source, these marks emit energy in such a way that the scanner photocell provides output signals as though the marks did not exist or that the marks are brighter than the background or the characters to be read. Therefore the entire black-to-white range for the device examining the characters is unaffected.
Many documents are pre-printed in a variety of colors in which the background as well as the character location markings may be printed with different colored inks. In order to eliminate the pre-printed color location markings which are printed with colored inks, various optical filters and lenses have been utilized in order to prevent this pre-printed information captured from the document from being presented to the optical character recognition device such that the optical character recognition device is "blind" to these colored inks. Color-sensitive photocells are utilized in the character recognition device in order to filter out the pre-printed character location pixel information and only present the true character pixel information to the optical character recognition device. Such systems require multiple optical filters which must be interchanged depending upon the colors of the ink utilized on the pre-printed form.
Another system proposed in order to prevent interference of pre-printed character location markings from the actual characters to be read on a document is described in U.S. Pat. No. 29,104, issued to David H. Shepard on Jan. 4, 1977 and entitled "Method of Scanning Documents to Read Characters Thereon Without Interference From Visible Marks on the Document Which Are Not To Be Read By the Scanner". This system utilizes a laser scanner unit adapted to scan a document. The color of the markings on the documents must be matched to the wavelength of the laser so that the light reflected from the markings has the same intensity as the light reflected from the document background. The pre-printed character location markings are "blinded" and do not interfere with the reading of the characters. Since the color of the pre-printed character location markings must be selected to match the wavelength of the laser, such a system does not have the ability to easily adapt to different colored pre-printed markings on numerous documents as well as to different colored pre-printed character location markings on the same document to be processed and read by an optical character recognition device.
Three dimensional color detection and modification systems have been used in a variety of applications. These applications include: the printing industry for pre-press color correction; in machine vision application to discriminate between objects; and in general color image processing, such as, for example, LANDSAT or medical imaging, for discrimination of important features. In general these systems utilize RGB image scanners and cameras, and convert RGB space to intensity/orthogonal chroma space (such as Y, I, Q or L, a, b) and/or hue, saturation, intensity space for the detection process. All three color spaces/scales are three dimensional.
Although existing three dimensional color detection/modification systems exist, these systems are inadequate for processing images for optical character recognition purposes, and therefore such systems do not function well for color filtering. In a scanning system with finite pixel size and conventional optics performance, enumerable pixel color values can be created at color transitions. The color values progress along a vector between the two colors, passing through the color space of other valid, solid (non-edge) colors. Existing color processing methods may "blind" the valid ink, but leave the transition values which may not be objectionable to the human eye, but which do interfere with the optical character recognition process of character location and recognition. Additionally, when the "blind" ink color is blinded, it should be changed to the color of the regional background which may or may not be a fixed color. The background can be affected by soiling, different characteristics of paper, and ink variations due to the printing process. Thus, existing three dimensional color detection and modification systems do not function to fully "blind" unwanted colored markings which interfere with character recognition.
A need has thus arisen for a document processing system for processing documents having character location markings which are visible to the eye, but which are "blind" to an optical character recognition device. Such a document processing system must be capable of processing documents having numerous colors without mechanically changing optical filters as well as eliminating different colored "blind" inks on the same document. Such a document processing system further requires the ability to operate with specially prepared forms, documents, or other surfaces on which characters to be read by a character recognition device are formed in a manner such that the marks other than the true characters are rendered indistinguishable by the optical character recognition device from the background reflectance of the surface.
A need has further arisen for a document processing system for minimizing the interference between a multiplicity of colors produced in a transition between colors, and valid writing instrument colors which may have the same three dimensional color coordinates. Additionally, a need has arisen for a document processing system having the ability to designate colors which are to be retained and not blinded. A need has further arisen for a document processing system for selecting multiple blind inks and multiple keeper inks which can be selected individually or used in various combinations which appear on the same document.