In certain areas, like government, health care, human resources, and insurance, the daily processing of a variety of paper forms is a routine and important activity. The processing of a form often involves: the extraction of the information on the form supplied by the users; specific actions that are governed by the specific nature of the extracted information; and, possibly, the archiving of the extracted information and/or the form itself in a manner that facilitates subsequent use of the archival information. While all of these steps can, and often are, performed by a human, the processing of large number of forms on a timely basis by means of digital computing devices would be desirable.
One common step in the automation of forms handling is the digitization of one or more forms by means of an appropriate scanning device. The result of the scanning process is a set of information representing the digitized form. The set of information is normally a rectangular array of pixel elements—an “image”—of dimensions W and H where the “width”, W, is the number of pixels in each horizontal row of the array and the “height”, H, is the number of pixels in each vertical column of the pixel array. The columns may be identified, for purpose of discussing such a set of information, by an index, I, whose values can range from 0 to W−1; and the rows can be identified by an index J whose values range from 0 to H−1 where W, H, J and I are integer values. If a pixel array itself is labeled as IMG, then the value of a pixel in the column with index I and row with index J is labeled for discussion purposed as IMG[I,J]. The ordered pair [I,J] is sometimes called the “address” or “pixel location” of this pixel.
While the particular colors that are used on forms can vary from application to application, most forms have only two distinguishing color features, the background color and the foreground color. It is common practice to set the values of all pixels representing the background color to a first number, e.g., 1, and all pixels representing the foreground color to another value, e.g., 0.
Forms frequently include combs which serve as guides for the placement of information on the form. Frequently, one of the goals of processing scanned forms is to extract the entered information from the form for later use and/or storage. While knowledge of an original form can help the extraction process, in order to support a wide range of forms it would be beneficial if an automated process for identifying and extracting combs from a scanned form, without requiring knowledge of the original form's comb arrangement, while preserving the text/information content on the form would be desirable. In particular, it would be desirable if an automated method and apparatus for identify one or more combs on a form could be developed. It would also be desirable if the automated method generated a set of comb information which could then be used to extract the combs from the image being processed.