This invention relates generally to electronic exchange of information and, more particularly, to extracting information from a document provided in electronic form.
Automatically exchanging information with another party via electronic documents is difficult. Typically both parties agree on using a common set of file exchange formats, which requires both parties to implement the necessary software logic to work with the mutually agreed upon exchange formats. However, when one of the participants involves a legacy computer application, it may not be practical to actually modify the application. Information therefore is exchanged using unstructured documents available through existing mechanisms, e.g., standard reporting interfaces and messaging mechanisms. To facilitate such unstructured information exchanges, software packages are commercially available that allow users to interactively work with unstructured electronic documents, define scripts to extract pertinent data from these documents, and facilitate importing the extracted information into a software system. However, these processes tend to be manual and require human knowledge and intervention to handle the arbitrary arrival of unstructured document types.
The present invention, in one aspect, includes systems and processes that automate receiving of unstructured information contained in electronic documents, detecting the document type, determining the corresponding document format, extracting structured information from the source document, and populating an information store with the extracted information. Generally, the electronic documents are pre-characterized and both extraction and mapping/translation details are developed as scripts on a per document type basis. These extraction and mapping/translation scripts are then automatically selected and used to automatically drive the subsequent information extraction processes.
Although print scraping is described herein in the context of financial lending, print scraping can be utilized in many other contexts. Print scraping can be used in connection with extracting information from a legacy report format. More specifically, print scraping is performed using processes that extract meaningful data from flat files from various systems in order to update a database. Since legacy systems vary in format and structure of reports, print scraping is used to parse out the required data for the database. As part of the process, the data is validated for errors and, in the context of financial lending, for example, the necessary business logic is applied for determining the credit availability for a client.