The present exemplary embodiments disclosed herein relate generally to the extraction of data from documents. They find particular application in conjunction with the generation of image anchor templates for content anchoring, and will be described with particular reference thereto. However, it is to be appreciated that the present exemplary embodiments are also amenable to other like applications.
When dealing with a large number of documents, it is often desirable to quickly extract data from the documents. Typical solutions often rely upon template matching techniques to locate image anchor templates for content anchoring. The idea being that if one can locate one or more image anchor templates within a target image of a document, one can determine the location of a data field within the document based upon its relative location to the one or more image anchor templates.
To illustrate, consider the problem of identifying the address and social security number fields in a document. Even if the document is fixed, the processes of printing, faxing, and scanning the document introduce distortions into target images of the document. Therefore, the relevant fields cannot be found at fixed displacement from boundaries into the target images of the document. Rather they need to be located with respect to fixed content in the target images. It is this fixed content that defines image anchor templates.
In view of the foregoing illustration, it should be appreciated that one important aspect of typical solutions is that they depend upon the ability of the image anchor templates to offer relatively fixed points of reference from which to determine a data field. Consequently, image anchor templates are chosen such that they can be localized with a high degree of reliability and robustness. That is to say, image anchor templates are chosen for their ability to reliably provide a fixed point of reference within a target image of a document.
In choosing image anchor templates, typical solutions rely upon an operator to manually select image anchor templates that can reliably act as anchoring points. To aid operators, there are many user interface tools designed to help operators, especially in massive data processing scenarios. These typically allow operators to select regions of interest in exemplar images with the aim of cropping the selected regions and using them as image anchor templates. Nevertheless, regardless of whether operators are aided with a user interface, typical solutions still rely upon the skill and intuition of an operator to generate the image anchor templates.
This reliance on an operator, however, may lead to sub-par image anchor templates and/or a waste of time and resources due to the difficulty of picking image anchor templates. Namely, visual elements easily located by the human eye are not necessarily good candidates for image anchor templates. The converse also holds true. For example, think of different barcodes to indicate different fields. Even more, it is also difficult for an operator to predict how a particular image anchor template will match to different target images and/or whether an image anchor template will reliably offer an anchor point across multiple documents. As a result of these difficulties, an operator will generally have to undergo a trial and error process that takes time and resources.
In view of the deficiencies noted above, there exists a need for improved systems and/or methods of generating image anchor templates. The present application contemplates such new and improved systems and/or methods which may be employed to mitigate the above-referenced problems and others.