Electronic color dropout is a process wherein the colored regions of a scanned document, which correspond to the original data entry forms lines and background areas, are “dropped” from the image. When used with an original data entry form that contains colored field dividing lines, and colored field name text, this type of processing effectively removes all the redundant image content from the form, and leaves only the data that has been entered over it. In other words, it renders the form areas of the processed image invisible, leaving only the text which appears in the original. Performing this step makes subsequent optical character recognition algorithms more effective, it reduces image storage space required, and improves retrieval efficiency, by removing unnecessary image content.
Current technology performs this task by examining each pixel of the input image, one at a time. Each pixel is compared to one or more dropout colors and a decision is made to either drop the pixel, process the pixel in some special way, or leave it unmodified. Some methods apply the values of the digitized color signal to a look up table, and determine what to do with the pixel based on the contents of the look up table. This is still essentially examining a single pixel color, however, the decision on color dropout has been predetermined and tabulated instead of being determined ‘on the fly.’
To better understand how the disclosed invention improves on the state of the art, the problems that have been observed with the existing techniques must be examined. Typically, a dropout algorithm will take a full color, digital input image, and convert it to grayscale. During this process, it will map most pixels to the directly corresponding normal grayscale value, but some pixels will be deemed ‘dropout’ pixels due to their color, and will be mapped to a background color instead of their normal grayscale. Then, the grayscale image is passed through an adaptive thresholding process (ATP) which converts the grayscale image to a bitonal image. The desired effect is that the resultant bitonal image will be black. In particular, the bitonal image will be white in the regions that were ‘dropped out.’ The problems with this technique are:                When the color being dropped is detected, the dropout process generally substitutes a background color, or grayscale level. But the actual background may be dark, and mottled, not solid white. Therefore, what color should be substituted? Substituting the wrong, or a flat background that does not match the local background of the image can create edges in the output which trigger subsequent adaptive thresholding process (ATP). This can make bitonal blotches in the final output where there are none in the original.        With a faded form, some of the colored areas that should be dropped are not the same shade as the core color of lines on the form. Some pixels in a nominally ‘red’ area may actually be no redder than the text, and further the luminance of some pixels is actually darker than the text. These pixels either do not dropout, or if the dropout tolerance is set high enough to drop them, the text will have voids in it.        The fuzzy edges on form lines can fail to dropout, because the color is not as vivid as the rest of the form. In extreme cases, this can cause the lines to double when the center or core of a line drops out and the edges do not, leaving a double line.        
With individual pixel processing, inconsistencies in the coloration of form areas can cause parts of the form to be retained, i.e., not dropout. If the tolerance is set too high on the dropout color (seeking to ensure complete dropout of form areas), then parts of actual desired text may be dropped undesirably. Further, substitution of an incorrect background color for the dropout regions can introduce steps in the color of the background area, causing ATP to produce unwanted artifacts in the output.