The present invention relates to a data duplication method, and particularly to a data duplication control method for DBMS data and a duplicated storage subsystem for data duplication.
Recently, business data analyzing systems called “data ware house systems” are becoming prevalent. The data ware house is a database management system (DBMS), and it has data of the database by extracting data from the trunk business database in general.
The data ware house system consumes much CPU power for the multi-dimensional analysis of enormous data. On this account, this system is constructed separately from the trunk business system to be capable of communication, with both systems having data communication via the LAN (local area network) or WAN (wide area network).
Generally, loading of data from the trunk business system to the data ware house system takes place in accordance with the following procedure.
(1) Extraction of necessary data by the trunk business DBMS on the part of the trunk business system.
(2) Transfer of the extracted data to the data ware house system by the FTP (file transfer program) or the like.
(3) Loading of the transferred data to the ware house database on the part of the data ware house system.
In the above procedure, the quantity of extracted data transferred in the step (2) can be several tens giga-bytes or more in large business firms. Data transfer, which is generally via the LAN or WAN, is a huge time consumer. Moreover, the LAN or WAN becomes so busy during this data transfer that other users of LAN or WAN are adversely affected.
Data extraction of the step (1) consumes much CPU power of the trunk business system, which deteriorates the CPU response of the inherent trunk business.
These problems of data extraction and transfer seems to be resolved by the prior transfer of trunk business data to the data ware house by use of the automatic data duplication scheme, such as the remote copy function, of the disk system. Specifically, the trunk business database table (DB table) is copied to the data ware house system in advance, so that data is loaded to the data ware house while eliminating the step (2) of the above procedure as follows.
(1) Extraction of necessary data from the copied trunk business DB table data on the part of the data ware house system.
(2) Loading of the extracted data to the ware house database on the part of the data ware house system.
This modified scheme seems good in terms of the reduction of task of the trunk business system by the shift of the data extraction process to the data ware house. However, there arises a new problem in regard to the matching of the DB table data in both systems.
The DBMS incorporates a cache (DB buffer) for holding part of the DB table data. Therefore, updating of the DB table data does not immediately prevail to the DB table data on the disk.
The timing of data updating is arbitrary, at which the DB table data of the disk system in the trunk business system is not necessarily in a state of matching, and accordingly the copied data is not necessarily in a state of matching either.