1. Field of the Invention
The present invention relates to a method for transforming data formats between different database management systems and an apparatus for executing the method, and more particularly to a method for transforming data formats between different database management systems, which needs no data transfer between a host computer and a disk storage device to reduce the system load when transforming a large scale database, and an apparatus for achieving the method.
2. Description of the Related Art
In the decade of 1990, data intensive applications have been emerged, such as data mining, data warehouse, and decision support system, which may process large amount of data. In such a situation, the amount of data doubles year by year, solutions adapted for efficiently managing data have been demanded. SAN (Storage Area Network) is one of solutions proposed in the second semester of 1998.
SAN is a network dedicated for data transfer, composed of storages and computers that access the storages. For example, data backup was done by using a LAN connecting other computers. When using SAN, the network dedicated for data transfer, the load traffic on the LAN can be reduced. The reduction of load on the LAN is one of major purposes of SAN. SAN may also be characterized by easy data sharing. This is because computers connected to SAN have physically access to any magnetic disk drives connected thereto.
However, when two computers can physically access to one same magnetic disk drive, it does not necessarily mean that the data can be shared at the application level. Data that is managed by a database management system (DBMS herein below) or a file system on one of those computers may be accessed by another computer, however another computer may have no means to interpret it. For this reason a variety of converter softwares have been developed for achieving data sharing between a file system and a DBMS or between different DBMS.
Data mining is often discussed as a method of effective exploitation of huge amount of data and tools for data mining are actively developed. In general, data mining tools may use data (for example, consumers' data) stored by OLTP (Online Transaction Program). An OLTP usually runs on a mainframe, and uses a DBMS for managing data. A data-mining tool, on the other hand, runs on an open system such as Unix or Windows NT, and analyses data after storing data into a DBMS. Here lies the necessity of data transfer from a mainframe to an open system and data conversion between different DBMS.
As known techniques of data conversion method between different DBMS there are discloses such as U.S. Pat. Nos. 6,016,501 and 6,035,307.
An EDM system (Enterprise Data Movement) system, cited in the above patent application Ser. No. 6,016,501 extracts data from the source DBMS to transform data format to that of targeted DBMS and feed the transformed data to the target DBMS. In general, data of the source DBMS and that of target DBMS are stored in a disk storage device, and the EDM system runs on a server. The data of source DBMS will be extracted to the server from the disk storage device through a SCSI channel, transformed to the data format specified by the target DBMS on the server, and loaded to the data field of the target DBMS through the SCSI channel.
FIG. 11 shows schematically this method.
FIG. 11 shows a schematic diagram illustrating a data conversion method in accordance with the Prior Art.
In the data conversion as shown in FIG. 11, data in a DB 1 format, stored in a disk 200A of a disk storage device 120 will be loaded into a Unix host computer 100B, transformed to data in a DB 2 format by the data extraction/conversion/loading program to write into the disk 200B.
The data transfer between server and disk storage occurs twice here (once for reading out source data, and once for writing down the transformed data).
For the purpose of performing data mining, the amount of data transferred from the mainframe to the Unix host can easily reach to a few Tbytes (terabytes). This amount can be otherwise described a 10-hours course using a fibre channel of 100 Mbytes per second. The load to the entire system will be reached to an extreme.
There may be cases in which instead of one-step operation of the extraction/conversion/loading from the source DBMS data format to the target DBMS data format, the operation may be performed in three separated steps of extraction, conversion, and loading. In FIG. 11 of the aforementioned U.S. Pat. No. 6,035,307, an example of the Prior Art is cited, which perform database format conversion via a few intermediate-working formats.
A database format conversion using some intermediate file formats will be described here by referring to FIG. 12.
FIG. 12 is a schematic diagram illustrating an exemplary data conversion in accordance with the Prior Art.
In a mainframe 100A, there is an extractor program, which transforms DB 1 format data in a disk 200A to the format 1 data on the disk 200B. On a Unix host 100B a transformer program and loader program are installed, the transformer program transforms the format 1 data on the disk 200B to the format 2 data on a disk 200C, while the loader program transforms the format 2 data on the disk 200D to the DB format 2 data on the disk 200D.
When transforming data of a database, if the intermediate data formats are used, the transformed intermediate data will also be written to the disk storage device. As a result the number of data transfer between the serve and disk storage will increase to 6 in this case, indicating the increase of data transfer time 6 times.
In the Prior Art as have been described above, data transformation is done on the host. This causes a problem that the data transformation will put some extreme load for the system. The larger the size of database is, the severer the problem becomes.
On the other hand, if the data transformation can be performed within a disk storage device, the data transfer between the server and the disk storage will be omitted.