1. Field of the Invention:
The present invention generally relates to systems for extracting information from printed forms by using optical character recognition scanners.
2. State of Art:
Although optical character recognition (OCR) scanners are well known, it is still common practice to manually extract information from printed forms. For example, information which has been written or typed onto medical forms and the like is usually extracted manually. Manual extraction of information from printed forms is time-consuming and subject to human error, but the extraction of information from forms with OCR scanners can also create errors.
FIG. 1 shows examples of situations that can cause conventional OCR scanning systems to err in extracting information from printed forms. Generally speaking, the errors occur because information that has been typed or written onto a printed form is slightly mis-positioned. For instance, the drawing shows characters 15 that have been typed onto a form at positions outside of zones defined by printed vertical lines 14. In addition, the drawing shows characters 16 that are positioned such that they descend over a printed horizontal line 14.
Characters 15 and 16 in FIG. 1 may be incorrectly extracted from a printed form by a conventional OCR scanning system because the OCR control system is confused by the placement of the characters across printed lines. More particularly, an OCR system may operate to only identify information which is printed in certain pre-defined reading zones and, therefore, may omit information which is printed or typed onto a form in transgression of its reading zones.
In the prior art, OCR scanning systems have been proposed that operate in ways to reduce the above-discussed difficulties in extracting handwritten or typed information from printed forms. For example, a workstation for extracting information from printed forms having particular colors is described in a brochure entitled "The Future Data Entry Workstation--POLYFORM--The Form Reader for Automatic Character Reading from Forms and Documents Written by Hand or Machine".
FIG. 2 shows a simplified example of one of the POLYFORM workstations. Generally speaking, the workstation includes a light source 2 which scans a beam 4 across a colored form 6. Interposed between the light source and the colored form is a wheel 8 comprised of filters, each of which has a different color. In operation of the workstation, a particular color filter is selected to match the color of the printed form, thereby allowing an OCR scanner 12 to discriminate typed or handwritten information from information printed on the form--provided that the handwritten or typed information has a different color than the printed form.
The system of FIG. 2 has several disadvantages. One disadvantage is that a different color filter must be selected whenever the color of a form is changed. Moreover, the filter must be selected manually, since the system lacks any intrinsic means of determining the required color of the filter. Another disadvantage is that the system usually cannot successfully extract information from multi-color forms. For instance, the system may not be able to successfully extract information from pink forms that have red high-lighted sections or blue sections.
A further disadvantage of the system of FIG. 2 is that the system can only operate upon forms of a limited number of colors. This limitation follows from the fact that, for practical reasons, the color wheel can comprise only a limited number of color filters. In a commercial sense, this limitation may be the most critical of all--since the system may become inoperative when there are relatively slight changes in color from one form to another due, for example, to aging by prolonged exposure to bright sunlight or to different printing runs.