Many documents carry informational markings and pre-printed character location markings visible to the eye which assist the document preparer in completing required information on the document Many pre-printed forms utilize these pre-printed character location markings for confining characters to specific locations and sizes in order to assist a character reader in identifying the characters added to the document.
Various commercially available optical character recognition devices function well in identifying characters which are clearly separated from extraneous lines, dots, printed material, and other visible matter which may be pre-printed on the document and which is not intended to be read by the optical character recognition device. Such OCR devices do not experience great difficulty in identifying a single character or line of characters on an otherwise clear surface. Similarly, optical character recognition devices experience little difficulty in identifying all of the characters on an entire page, provided that there are no extraneous markings on the page and that the characters are properly registered. On some documents; however, it is essential that characters be printed on forms which are pre-printed in a manner such that writing areas are separated by visible lines or other marks. Such lines are necessary for separating the data in an orderly fashion. Many government forms such as income tax forms, census forms, social security forms, and others, have boxes within machine identification of hand-printed characters is assisted if visible constraint marks on the document are pre-printed to assist the preparer of the document. The desirability of the pre-printed character location markings can be balanced against the problems of optical character recognition devices in recognizing characters on documents containing such markings. The OCR devices must rely on light reflectance from the character-background and absorption from the characters themselves to distinguish between true characters which are to be identified by the OCR device, and other visible markings adjacent to the character, touching or passing through the characters or surrounding the characters.
Document processing systems have been proposed in order to allow the OCR device to distinguish between the pre-printed character location markings and the actual characters to be read. One such system is described in U.S. Pat. No. 3,444,517, issued to J. Rabinow on May 13, 1969 and entitled "Optical Reading Machine and Specially Prepared Documents Therefor". This optical reading machine utilizes pre-printed documents in which the character location markings are printed in fluorescent material. The document is subjected to exciting radiation during the reading cycle such that the true characters will reflect very little of the radiations, but the marks will be energized in a manner to emit energy to which a scanner photocell is sensitive. Although these marks under ordinary light appear to be visible, when energized, in cooperation with their radiation source, these marks emit energy in such a way that the scanner photocell provides output signals as though the marks did not exist or that the marks are brighter than the background or the characters to be read. Therefore the entire black-to-white range for the device examining the characters is uneffected.
Many documents are pre-printed in a variety of colors in which the background as well as the character location markings may be printed with different colored inks. In order to eliminate the pre-printed color location markings which are printed with colored inks, various optical filters and lenses have been utilized in order to prevent this information captured from the document being presented to the optical character recognition device such that the optical character recognition device is "blind" to these colored inks. Color-sensitive photocells are utilized in the character recognition device in order to filter out the pre-printed character location pixel information and only present the true character pixel information to the optical character recognition device. Such systems require multiple optical filters which must be interchanged depending upon the colors of the ink utilized on the pre-printed form.
Another system proposed in order to prevent interference of pre-printed character location markings from the actual characters to be read on a document is described in U.S. Pat. No. RE.29,104, issued to David H. Shepard on Jan. 4, 1977 add entitled "Method of Scanning Documents to Read Characters Thereon Without Interference From Visible Marks on the Document Which Are Not To Be Read By the Scanner". This system utilizes a laser scanner unit adapted to scan a document. The color of the markings on the documents which are not to be read is related to the laser wavelength so that the light reflected from the markings has the same intensity as the light reflected from the document background and the presence of these pre-printed character location markings does not interfere with the reading of the characters. The laser wavelength is therefore keyed to the color of the pre-printed character location markings, and such a system would not have the ability to easily adapt to different colored pre-printed markings on numerous documents as well as to different colored pre-printed character location markings on the same document to be processed and read by an optical character recognition device.
For many applications, document processing systems also have the requirement of displaying an image of the document being processed by the system. The image may be used by an operator for verifying or correcting data read by the optical character recognition portion of the document processing system. It is desirable for the displayed image to accurately reflect the actual image of the document for use by the operator of the document processing system. The image displayed should therefore include the pre-printed character location markings which, as previously stated, interfere with the recognition process of optical character recognition devices within the document processing system. If the data presented to the optical character recognition unit is also displayed to the operator, much of the actual informational content of the document is missing, and the user of the system does not have the ability to view a true image of the document being processed. Systems have been proposed for independently capturing the image of a document and for capturing data for input to an optical character recognition device. Such a dual capture system is described in U.S. Pat. No. 4,205,780, issued to Emmett Burns et al. on June 3, 1980, and which is entitled "Document Processing System and Method". This system, like other systems utilizing optical character recognition devices require separate data catpure devices which are separately optimized for image and data capture functions.
A need has thus arisen for a document processing system for processing documents having pre-printed character location markings which are visible to the eye, but which are "blind" to an optical character recognition device and which further displays an accurate image of the document. Such a document processing system must be capable of processing documents having numerous colors without mechanically changing optical filters as well as eliminating different colored "blind" inks on the same document. Such a document processing system further requires the ability to operate with specially prepared forms, documents, or other surfaces on which characters to be read by a character recognition device are formed in a manner such that the marks other than the true characters are rendered indistinguishable by the optical character recognition device from the background reflectance of the surface.
A need has further arisen for a document processing system having a single data lift for capturing both an image of a document and characters to be presented to an optical character recognition device. Such a system must also be programmable for recognizing different types of optical characters, be electronically changeable for the display of different types of document images as well as having a fast response time as colors change.