1. Technical Field
The present invention relates to data extraction from documents and more particularly to systems and methods which extract contract data automatically and efficiently from an electronic contract composed of a number of documents in a given format.
2. Description of the Related Art
Much business between enterprises is conducted under contract. Contracts constitute the binding relationship between a company and its customers or suppliers. Everyday, many contracts are created, executed and managed via paper-based manual processes in large enterprises. Automation of the contract lifecycle presents a substantial value creation opportunity for enterprises. This value stems from improved productivity and security, effectively aggregated contract information, accelerated contract lifecycle processes, reduced contractual errors and risk, enabled revenue forecast and profit optimization, as well as better compliance enforcement.
With the advent of Internet technology and electronic commerce, there are growing research activities and implementation efforts on electronic contracts. Currently, the International Association of Contract and Commercial Managers have listed twenty commercially available software products for electronic contract management. Most of the research activities reported is focused on electronic contract creation or representation language, negotiation, management, collaboration, execution, fulfillment and enforcement, performance, digital signatures and data mining. However, none of these aspects has provided an automatic electronic data extraction solution to enable data mining for revenue forecast and profit optimization.
A single electronic contract can encompass a large number of collateral documents including master and customer agreements, supplements, addenda and the like. These various documents are of different contract document types. There can be over a hundred different basic types of contract documents in a large company. A few examples of these contract document types are as follows, “Master Agreement”, “Customer Agreement”, “Term Lease Supplement”, “Addendum to Term Lease Supplement”, “Statement of Work for Services”, “Change Authorization for Services”, etc. Moreover, they can also be in different file formats, such as PDF, XML, Microsoft Word™, Lotus WordPro™.
An electronic contract management system can be used to automatically convert all these contract documents of different types into PDF format and then merge them together to form a single electronic contract PDF document. However, data extraction and mining on this kind of electronic contract is still very difficult if not impossible. To do this, a user should find out how many contract documents are in an electronic contract composed of a number of contract documents, and then determine their contract document types. Next, what contract data to extract should be decided and from which contract document. The user would further need to find out where on the contract document the contract data is located, such as page and line numbers. There are many more tasks to be overcome before one can implement a data extraction and mining on this kind for electronic contracts.