The present invention is in the general field of automatic processing of forms and relates, more specifically, to distinguishing between similar forms, and classifying an input form as one out of few possible similar forms.
With the development of advanced image processing technologies, image processing applications have been introduced including automatic processing of forms. The latter application is useful e.g. for mass processing of application forms in various organizations such as post offices, telephone companies and others. An exemplary form processing application is the IFP system, commercially available from IBM Corp. Automatic form processing of the kind specified carries the obvious advantage that manual processing may be completely or partially eliminated thereby not only expediting the processing rate but also lowering costs by reducing manpower.
In a typical form processing application, it is necessary to store, in advance, templates of similar blank forms (constituting a list of candidate forms). When an already filled-in form is fed to the application for processing, it is necessary to identify the correct candidate blank form in the list and to xe2x80x9csubtractxe2x80x9d it from the filled form thereby obtaining the filled-in data that are then used for further processing. Ideally, the list of candidate forms has exactly one member in which case the only member obviously corresponds to the input filled-in form. However, in many real life scenarios, the list of candidates may exceed, say 20 members thus hindering the assignment of an input form to the correct candidate.
There is, accordingly, a need in the art for a processing technique, capable of minimizing the list of potential form candidates, preferably to only one. There is a further need in the art for a training technique which automatically identifies the distinctive features of a blank form, vis-a-vis other similar forms and stores them in a database for future form processing purposes.
According to the invention there is provided a method for identifying at least one distinguishing feature between an input new blank form B and an already stored blank form A, comprising the steps of:
(a) obtaining first and second images that correspond to said form A and form B, respectively;
(b) substantially aligning the first image with respect to the second image or vise versa;
(c) defining boxes in the second image, each box constituting a sub-image in said image that corresponds to an area in the blank form B;
(d) ranking the boxes so as to give rise to respective box rank scores, the ranking criteria used for ranking each box includes a first criterion: the likelihood that the area in the blank form B is a fill-in area;
(e) identifying a box from among the boxes of step (d) such that at least the following conditions are met:
(i) it is sufficiently distinguishable from a corresponding box of said first image;
(ii) it has a good box rank score; said box constituting a verification box;
(f) in the case that no box is identified in said step (e), identifying a box from among the boxes of said first image such that at least the following conditions are met:
(i) it is sufficiently distinguishable from a corresponding box of said second image;
(ii) it has a good box rank score; said box constituting a rejection box.
The invention further provides for a method for identifying at least one distinguishing feature between an input new blank form B and other blank form A, comprising the steps of:
(a) identifying at least one property in the input form which is sufficiently distinguishable from a corresponding property in the other form; said identified property complies at least with a criterion that relates to the likelihood that the property is retained substantially invariable under use of the form; said property constitutes a verification property;
(b) in the case that no property is identified in said step (a), identifying at least one property in the other form which is sufficiently distinguishable from a corresponding property in the input form; said identified property complies at least with a criterion that relates to the likelihood that the property is retained invariable under use of the form; said property constitutes a rejection property.
Still further, the invention provides for a method for classifying an input form as corresponding to a candidate blank form from among a list of candidate blank forms; each one of said candidate forms is associated with at least one verification property or rejection property vis-a-vis respective form in said list; said verification property corresponds to a property in said candidate form that sufficiently distinguishes it vis-a-vis another form in the list; said rejection property corresponds to a property in said another form that sufficiently distinguishes it vis-a-vis said candidate form;
the method comprising the steps of:
(a) selecting a candidate form from said list;
for the verification property associated with said candidate form, if any, perform: in the case that the verification property substantially mismatches a corresponding property in the input form, indicating that said candidate form does not correspond to said input form; or, for a rejection property associated with said candidate form, if any perform: in the case that the rejection property substantially matches a corresponding property in the input form, indicating that said candidate form does not correspond to said input form;
(b) in the case that the stipulations of step (a) are not met in respect of a number of said candidate forms indicating that said candidate form is classified as corresponding to said input form.
Yet still further the invention provides a system for identifying at least one distinguishing feature between an input new blank form B and an already stored blank form A, the system includes a processor and associated memory comprising:
(a) a device for obtaining first and second images that correspond to said form A and form B, respectively;
(b) aligning device for substantially aligning the first image with respect to the second image or vise versa;
(c) box processing device for defining boxes in the second image, each box constituting a sub-image in said image that corresponds to an area in the blank form B;
(d) box ranking device for ranking the boxes so as to give rise to respective box rank scores, the ranking criteria used for ranking each box includes a first criterion: the likelihood that the area in the blank form B is a fill-in area;
said system further including a decision unit capable of:
(e) identifying a box from among the boxes of step (d) such that at least the following conditions are met:
(i) it is sufficiently distinguishable from a corresponding box of said first image;
(ii) it has a good box rank score; said box constituting a verification box.
(f) in the case that no box is identified in (e), identifying a box from among the boxes of said first image such that at least the following conditions are met:
(i) it is sufficiently distinguishable from a corresponding box of said second image;
(ii) it has a good box rank score; said box constituting a rejection box.
The invention further provides a system for identifying at least one distinguishing feature between an input new blank form B and other blank form A, the system comprising a processor and associated memory capable of:
(a) identifying at least one property in the input form which is sufficiently distinguishable from a corresponding property in the other form; said identified property complies at least with a criterion that relates to the likelihood that the property is retained substantially invariable under use of the form; said property constitutes a verification property;
(b) in the case that no property is identified in (a), identifying at least one property in the other form which is sufficiently distinguishable from a corresponding property in the input form; said identified property complies at least with a criterion that to the likelihood that the property is retained invariable under use of the form; said property constitutes a rejection property.
Still further the invention provides a system for classifying an input form as corresponding to a candidate blank form from among a list of candidate blank forms; each one of said candidate forms is associated with at least one verification property or rejection property vis-a-vis respective form in said list; said verification property corresponds to a property in said candidate form that sufficiently distinguishes it vis-a-vis another form in the list; said rejection property corresponds to a property in said another form that sufficiently distinguishes it vis-a-vis said candidate form;
the system comprising a processor and associated memory capable of:
(a) selecting a candidate form from said list;
for the verification property associated with said candidate form, if any, perform: in the case that the verification property substantially mismatches a corresponding property in the input form, indicating that said candidate form does not correspond to said input form; or, for a rejection property associated with said candidate form, if any perform: in the case that the rejection property substantially matches a corresponding property in the input form, indicating that said candidate form does not correspond to said input form;
(b) in the case that the stipulations of step (a) are not met in respect of a number of said candidate forms indicating that said candidate form is classified as corresponding to said input form.
Still further the invention provides a processor having associated memory that includes data usable by an application selected from a group that includes form training and form classification applications; the data includes data being representative of:
a plurality of a similar forms, each form is associated with:
at least one verification property which is sufficiently distinguishable vis-a-vis the property of another form; said property complies at least with a criterion that relates to the likelihood that the property is retained substantially invariable under use of the form;
at least one rejection property of said another form which is sufficiently distinguishable vis-a-vis the property of said form; said property complies at least with a criterion that relates to the likelihood that the property is retained substantially invariable under use of the form.