With the development of the Information Technology, more and more enterprises begin to use various applications to manage different data information, and then different kinds of data files are formed. Further, the communications between the enterprises are becoming closer and closer, such that the data exchange inside the enterprise and between the enterprises is becoming frequent. Then a problem is introduced that successfully exchanging data having different format is necessary for integrating different network application system.
In the prior art, the specialized data transformation tool is developed for a particular application system, in order to transform an original data into an objective data. However, in such a data transformation technology, the coding of the data transformation tool needs to be updated or debugged whenever the data format of the application system is changed. Such a data exchange technology wastes time and human resource and the efficiency of which is also low.
In order to exchange data, it is firstly needed to understand, analyze and process the original data having different formats. Most of the prior application systems utilize the method of generating and processing the data files having relatively fixed formats and structures. And it is therefore naturally developed the data transformation methods and the tools for locating, extracting and transforming the data in a data file. The data file named herein typically refers to the file encoded as printable characters, including the text formats understandable by machines such as the inquiry results list of a database, EDI messages, the recognized results by scanning imagines in a table process system, the general reports used for reading, transmitting or printing, which are generated by EPR or other application system.
The prior technologies for locating and transforming the data in a data file includes the XML Converter developed by the Unidex company. The XML Converter transforms the data in the data file having simple and delimitated format. For example, it requires the data file to be processed must consist of records, where each record is a sequence of fields. The records and the fields are delimitated by separators. The fields that are not delimited must have fixed length.
Additionally, U.S. Pat. No. 4,965,763 discloses a data transformation method to analyze data files by using the structural, syntactic and semantic knowledge about the data files. This patent is particularly appropriated to extract information from business correspondence documents.
U.S. Pat. No. 5,664,109 discloses a technology for locating and extracting data by key words.
The patent is used to automatically retrieve documents from a medical records repository.
European patent EP 1016982 discloses a method for extracting and outputting data from a database having better structure.
The above prior art, however, can only apply in specific application environments, and provide the key words matching or semantic analysis, and therefore adapt merely to process the text having simple format.
As a result, a generic data extracting and transforming method and tool for various data files is needed to simply and efficiently transform the data in the data files.