The present invention relates to a data duplication method, and particularly to a data duplication control method for DBMS data and a duplicated storage subsystem for data duplication.
Recently, business data analyzing systems called xe2x80x9cdata ware house systemsxe2x80x9d are becoming prevalent. The data ware house is a database management system (DBMS), and it has data of the database by extracting data from the trunk business database in general.
The data ware house system consumes much CPU power for the multi-dimensional analysis of enormous data. On this account, this system is constructed separately from the trunk business system to be capable of communication, with both systems having data communication via the LAN (local area network) or WAN (wide area network).
Generally, loading of data from the trunk business system to the data ware house system takes place in accordance with the following procedure.
(1) Extraction of necessary data by the trunk business DBMS on the part of the trunk business system.
(2) Transfer of the extracted data to the data ware house system by the FTP (file transfer program) or the like.
(3) Loading of the transferred data to the ware house database on the part of the data ware house system.
In the above procedure, the quantity of extracted data transferred in the step (2) can be several tens giga-bytes or more in large business firms. Data transfer, which is generally via the LAN or WAN, is a huge time consumer. Moreover, the LAN or WAN becomes so busy during this data transfer that other users of LAN or WAN are adversely affected.
Data extraction of the step (1) consumes much CPU power of the trunk business system, which deteriorates the CPU response of the inherent trunk business.
These problems of data extraction and transfer seems to be resolved by the prior transfer of trunk business data to the data ware house by use of the automatic data duplication scheme, such as the remote copy function, of the disk system. Specifically, the trunk business database table (DB table) is copied to the data ware house system in advance, so that data is loaded to the data ware house while eliminating the step (2) of the above procedure as follows.
(1) Extraction of necessary data from the copied trunk business DB table data on the part of the data ware house system.
(2) Loading of the extracted data to the ware house database on the part of the data ware house system.
This modified scheme seems good in terms of the reduction of task of the trunk business system by the shift of the data extraction process to the data ware house. However, there arises a new problem in regard to the matching of the DB table data in both systems.
The DBMS incorporates a cache (DB buffer) for holding part of the DB table data. Therefore, updating of the DB table data does not immediately prevail to the DB table data on the disk.
The timing of data updating is arbitrary, at which the DB table data of the disk system in the trunk business system is not necessarily in a state of matching, and accordingly the copied data is not necessarily in a state of matching either.
With the intention of reducing the data transfer time based on the copying of DB table data by use of the scheme of automatic prior data duplication, such as the remote copy function, of the disk system, instead of the data file transfer by the FTP or the like, it is an object of the present invention to provide a scheme of certifying the matching of copied DB table data of the copy destination. This scheme is capable of not only reducing the data transfer time, but also shifting the data extraction process of the trunk business system to the data ware house system thereby to prevent the deterioration of response of the trunk business system which would arise at the data extraction process.
In order to achieve the above objective, the present invention resides in a data duplication control method for a system which includes a main system having a first processor system in which a first DBMS operates and a first storage subsystem connected to the first processor system and a subordinate system having a second processor system in which a second DBMS operates and a second storage subsystem connected to the second processor system, with the first and second storage subsystems being connected to each other, the method comprising a step, which is implemented by the first DBMS by being directed by the first processor system, of overwriting database table data (will be termed xe2x80x9cDB table dataxe2x80x9d) which is stored in a cache and put in and out by the first DBMS to the database table (will be termed xe2x80x9cDB tablexe2x80x9d) of the first storage subsystem, and holding the updating of the DB table data in the DB table against a subsequent request of updating of the DB table data, and a step, which is implemented by the first storage subsystem by being directed by the first processor system, of transferring and copying the DB table data of the DB table of the self storage subsystem to the DB table of the second storage subsystem thereby to duplicate the DB table data, and suspending the subsequent transfer of DB table data so that the DB tables of the first and second storage subsystems are in a state of matching, thereby enabling the second DBMS to refer the copied DB table in a state of matching.
The present invention also resides in a data duplication control method for a system which includes a main system having a first processor system in which a first DBMS operates and a first storage subsystem connected to the first processor system and a subordinate system having a second processor system in which a second DBMS operates and a second storage subsystem connected to the second processor system, with the first and second storage subsystems being connected to each other, the method comprising a step, which is implemented by the first DBMS by being directed by the first storage subsystem, of overwriting DB table data which is stored in a cache and put in and out by the first DBMS to the DB table of the first storage subsystem, and holding the updating of the DB table data in the DB table against a subsequent request of updating of the DB table data, and a step, which is implemented by the first storage subsystem, of transferring and copying the DB table data of the DB table of the self storage subsystem to the DB table of the second storage subsystem thereby to duplicate the DB table data, and suspending the subsequent transfer of DB table data so that the DB tables of the first and second storage subsystems are in a state of matching, thereby enabling the second DBMS to refer the copied DB table in a state of matching.
The DB table data which has been overwritten to the DB table of the first storage subsystem is held in the remote copy buffer of the first storage subsystem, and the first storage subsystem transfers and copies the DB table data in the buffer to the DB table of the second storage subsystem in response to the command of duplication, thereby duplicating the DB table data.
Storage location information of the DB table for the DB table data which has been overwritten to the DB table of the first storage subsystem is stored in the memory of the first storage subsystem, and the first storage subsystem reads the location information out of the memory in response to the command of duplication and transfers and copies the DB table data in the DB table as indicated by the location information to the DB table of the second storage subsystem, thereby duplicating the DB table data.
The present invention also resides in a duplicated storage subsystem which includes a first storage subsystem having a first external storage control unit and a first external storage unit and is connected to a first processor system, and a second storage subsystem having a second external storage control unit and a second external storage unit and is connected to a second processor system, with the first and second external storage control units being connected to each other, wherein the first external storage control unit includes a means of overwriting data, which is held in a cache of the first processor system, to the first external storage unit, a means of holding the updating of the data which has been written to the first external storage unit against a subsequent request of updating, a means of transferring the data to the second storage subsystem thereby to duplicate the data, and a means of suspending the subsequent transfer of data from the first external storage unit to the second storage subsystem, so that the data stored in the first and second storage subsystems are in a state of matching.
The first external storage control unit includes a means of storing the data which has been overwritten to the first external storage unit into a buffer of the first external storage control unit, and a means of transferring the data in the buffer to the second storage subsystem in response to the command of duplication, thereby duplicating the data.
The first external storage control unit includes a means of storing storage location information of the first external storage unit for the data which has been overwritten to the first external storage unit into a memory of the first external storage control unit, and a means of reading the location information out of the memory in response to the command of duplication and transferring the data which is stored in the first external storage unit as indicated by the location information to the second storage subsystem, thereby duplicating the data.
The first external storage control unit and the second external storage control unit are linked through an exclusive communication line to carry out the data communication.
Alternatively, the first external storage control unit and the second external storage control unit are linked through a switch or a network to carry out the data communication.