1. Field of the Invention
The present invention relates generally to methods and systems for image processing, and particularly to methods and systems for matching documents and lifting annotations present in only one of the two matched documents.
2. Description of Related Art
Traditionally many information gathering tasks have been carried out using paper documents. Examples include opinion surveys, tax returns, check processing, and revising and editing text. An important step in many such tasks is annotation lifting, that is, the problem of extracting content added to an original document. A special case of annotation lifting is called form dropout, in which the added content is assumed to appear only in certain predefined locations. In order to extract annotations from an annotated document, the annotated document has to be matched with an image of the original document of which the annotated document is a copy.
Matching two images of a same document is not a simple task since the two images could differ significantly due to faxing, scanner distortions, or degradation through multigeneration copying. Preprinted data of an original document image may appear distorted when compared to preprinted data of an annotated copy of the same original document, and the distortion may be different for different parts of the image.
There is a need for an efficient method for registering two such images and for separating out annotations. There is also a need for an efficient algorithm for detecting and repairing broken strokes in the annotations, which occur when the annotations cross or touch preprinted data.
In addition to annotation lifting, such method for document matching also finds application in duplicate removal and in document image authentication in which the task is to confirm that a document has not changed since authorship.