In the regular course of business, nearly all organizations receive forms containing data (such as bills/invoices, purchase orders, claims, etc.) whose content must be captured and transferred into Target Applications. Such Target Applications may perform one or more specific tasks, or store the data in databases. Known programs for processing forms (FPA—Form Processing Applications or FPP—Form Processing Programs) usually apply OCR (Optical Character Recognition) techniques to scanned images of forms in order to save data typing. However, in most cases, these products require a significant amount of customization work so they can be applied.
There are two types of customization work that are built-in in existing software products.
Since it is common to have to process forms that have a large number of different Layouts—possibly hundreds or even thousands—form-processing programs require prior knowledge of the physical Layout of these forms. Collecting sample forms and preparing the prior knowledge in a useful way for use with the form-processing program is typically a tedious and expensive task.
Linking the form processing program and the Target Application also requires a significant amount of work. For example, the data captured by the form processing system should be transferred to the Target Application and validated against existing data within the Target Application.
The present invention is aimed at simplifying the process of capturing data from scanned images of forms, and transferring the data to Target Applications yet virtually eliminating the two types of customization work listed above.
Several existing form-processing programs extract data from scanned images of forms using OCR technology, and send the data into Target Application data files. Some examples of FPPs are: FormWare™ from Captiva Software Corporation, FormReader™ from ABBYY Software House, and OCR for AnyDOC™ from AnyDoc Software Corporation.
These, and other such programs, typically operate in two main modes:
Setup Mode—in which the user defines a Template for each form Layout. In most of the existing form processing products this is done by drawing rectangle regions on the image of the scanned form, and defining OCR instructions for each region. This collection of regions and instructions is saved as a Template for each form Layout.
Run Mode—forms containing data are scanned and automatically matched to a Template using a Template Matching Algorithm, then, for each pre-defined region, a standard OCR program extracts data from it. A subsequent step in Run Mode is when a human operator verifies and corrects the OCR results. The final step in Run-Mode is to send the verified data to the Target Application by creating files that the Target Application can process at a later time.
Even when the form is known, like invoices or orders, the Setup mode is user-specific because each user needs to process different form Layouts. The amount of work during this mode is proportional to the number of form Layouts encountered. The integration of the final Run Mode step into a Target Application is specific to that application, as each Target Application requires different customization.
It is a purpose of the present invention to provide system and method for form processing which allows data extraction and posting without the need for initial customization for each end user.
Other objects and advantages of the present invention will become apparent after reading the present specification and considering the accompanying figures.