Many documents are created following a prescribed format/structure. Examples of these types/forms of documents include government issued identification cards, birth certificates, job applications, driver's licenses, employee identification cards, insurance cards, merchandise receipts, accounting reports, etc. and may generally be referred to as form documents. Many organizations presented with these form documents desire to gather the information presented in the form document for later use. For example, a doctor's office will often request to inspect a patient's insurance card to retrieve information regarding the patient's insurance provider, policy number, etc. The gathering of this information often requires manual re-entry of the information into a database where the information can be retrieved at the appropriate time. Alternatively, the office may produce an image of the insurance card so that if additional information is required, the image may later be referenced to obtain the information. Without entering the information presented on the card into a database, however, the information has limited usability. For example, if the doctor's office desires to identify all patients using a particular insurance policy, unless this information was previously manually entered into the database, all insurance card images would need to be manually reviewed to determine which patients utilized the policy in question. Thus, there is a need for a device and method which provides the ability to collect information from a form document in an efficient and accurate manner.
Prior attempts to automate the collection of information from images provide for matching query images with the template images associated to the same document type/form in order to identify the location of the information to be extracted from the query image. Each of these methods requires processing template images of the desired type/form before the matching of the template image with the query image and extraction of information from the query image can be achieved. U.S. Pat. No. 6,886,136 to Zlotnick describes a process wherein fifty samples of a particular template image are used to create a template file suitable for attempting matches of the template file to the query image. More specifically, the process described by Zlotnick includes summing of information obtained from each of fifty sample template images associated with a particular document type/form. This summation is then used to create a template file associated with the particular document type/form. A problem with this approach is that often fifty sample images for a particular document type/form are not available and in many instances, only a single image is available for a particular document type/form.
Other methods require that the document type/form has specific features. For example, U.S. Pat. No. 6,400,845 to Volino identifies horizontal and vertical lines within a “master” document to provide registration data. When horizontal and vertical lines of the scanned image match the registration data, the scanned image is considered a match. A disadvantage of this method is that at times the template image may not have horizontal and/or vertical lines which can be used to create the registration data. In other scenarios, such horizontal and vertical lines are not sufficiently distinct to identify a proper match between the template image and the query image.
A need exists therefore to provide a method for matching template images and query images without requiring a large volume of template images and without requiring that the template image include certain characteristics.