The exemplary embodiment relates to a system and method for reconstructing a clean document from a set of annotated document images.
It is common for a given form to be completed by a number of users and submitted for processing, which generally includes scanning the form and identifying the user's additions. The separation of variable text (e.g., names, addresses, dates, dollar amounts, etc.) from fixed text in filled-out (annotated) forms can be difficult in the realm of document scanning services. If the locations of fields of the form, where the user has entered the information, are known, the separation is relatively easy. However, particularly with forms that may have been created at some time in the past, this information is often not available.
Thus, it is desirable to have a method for separating annotated data from annotated document images that is almost entirely automated, even for new types of forms that have never been encountered before.