1. Field of the Invention
The present invention relates to a data duplication system preferable for a magnetic disk array subsystem and the like. More specifically, the present invention relates to a data duplication device which duplicates data by using a snapshot technique, and to a data duplication method as well as a program thereof.
2. Description of the Related Art
Recently, as the social infrastructure has come to be built on IT (information technology), the amount of data held by business enterprises and individuals has been increased drastically. Further, due to spread of online electronic commerce, legislation for keeping endorsement data, and the like, the quality and value of the data itself have become elevated. Under such circumstances, influences of having such incident of losing data have become widely known to the public. At the same time, a great attention is drawn to backup techniques as a means for preventing data loss in advance.
Here, a mirroring-type backup procedure employed in a disk array subsystem as a data duplication system will be described.
At first, an application of a database and the like making access to a master volume is stopped in order to secure the rest point of the volume (master volume) that is the target of backup. Then, a backup volume having the same capacity as that of the master volume is set, and the entire data of the master volume is copied to the backup volume.
When the copy is completed, the stopped application of the database is restarted. In the meantime, the data is read out from the backup volume and saved in a backup device such as a tape.
With the procedure described above, the backup volume is created and the data is copied to the backup volume from the master volume to the backup volume after stopping the application. However, there is also a method which shortens the time from the point where the rest point is secured to the point where the master volume and the backup volume are synchronized by creating the backup volume while the application is in operation and starting up the data copy from the master volume to the backup volume.
However, with any of those methods, a prescribed length of time according to the copy amount is required from the point where the point of rest is secured to the point where the data of the master volume is completely copied to the backup volume. Further, for the cases other than backup use such as data mining, for example, a plurality of backup volumes are often created for a same master volume. In such case, normally, the data amount that is several times larger than that of the master volume is required for the backup volume.
In order to avoid such issue of the mirroring method, recently, backup employing a snapshot method has been used frequently (see Japanese Unexamined Patent Publication 2005-208950 (Patent Document 1)). Here, an example of a disk array subsystem 100 that is a data duplication system employing the snapshot method that uses a shared pool will be described by referring to FIG. 9, FIG. 10, and FIG. 11.
The disk array subsystem 100 as the data duplication system employing the snapshot method includes: a master volume 101; a virtual volume (referred simply as “snapshot volume” hereinafter) 102 having a same capacity as that of the master volume but does not actually have a physical capacity; a volume (referred simply as “shared pool volume” hereinafter) 103 which stores data of the snapshot volume 102; a data duplication control module 104 which manages data accesses made to the snapshot volume 102; an address changing module 105 which manages the actual storing place of the duplicated data; a duplication control memory 200 and an address changing memory 300 provided to the data control module 104 and the address changing memory 105, respectively.
Among those, the duplication control memory 200 provided to the data duplication control module 104 includes: an attribute managing table 201 which manages the attributes of the volumes such as the snapshot volume and the like; a volume correspondence managing table 202 which holds the snapshot relations between the volumes; and a difference managing table 203 which manages differences between the master volume 101 and the snapshot volume 102.
Among those, the difference managing table 203 of the duplication control memory 200 takes value “0” when the snapshot volume 102 does not hold data, and takes value “1” when the snapshot volume 102 holds data.
Further, the address changing memory 300 includes: a directory 301 which holds the actual storing address of the snapshot volume 102; and an allotment managing table 302 which manages the use state of the shared pool volume.
Next, operations of the disk array subsystem (data duplication system) 100 mentioned above will be described by referring to FIG. 12, FIG. 13, and FIG. 14.
Here, LV (logical volume) 0 is used as the master volume 101, LV1 is used as the snapshot volume 102, and LV2 is used as the shared pool volume (referred to as “shared volume” hereinafter) 103. Further, it is assumed that data of (AA, BB, CC, DD, - - - , NN) are stored in the master volume 101 before receiving a snapshot command.
As the procedure for duplicating each volume 101 or 102 mentioned above, first, the shared volume 103 is set within the disk array subsystem 100 and the attribute managing table 201 and the allotment managing table 302 are initialized at a first step S701 as shown in a flowchart of FIG. 12.
That is, the share attribute is set to LV2 in the attribute managing table 201, value “0” which shows that the shared pool volume is in an unused state is set in the allotment managing table 302. Then, the master volume 101 and the snapshot volume 102 having a same storage capacity as that of the master volume 101 are set in step S702.
Further, when the disk array subsystem 100 receives a snapshot command in next step S703, the attribute managing table 201, the volume correspondence managing table 202, and the difference managing table 203 are initialized in step S403.
That is, a master attribute is set for the master volume (LV0) 101 and a snapshot attribute is set for the snapshot volume (LV1) 102 in the attribute managing table 201. LV1 is set for LV0 of the snapshot and LV0 is set for LV1 of the snapshot in the volume correspondence managing table 202, respectively, so as to show that the master volume (LV0) 101 and the snapshot volume (LV1) 102 are in a snapshot relation. Value “0” indicating that the snapshot volume in the difference managing table 203, i.e., LV1 in this case, does not hold data is set, and a null value is set in the directory 301 showing that a storage space of the shared volume is not allotted to the snapshot volume.
Next, described are processing procedures to be executed when the disk array subsystem (the data duplication system) 100 described in the volume duplication procedure receives a write command and a read command.
First, the processing procedure to be executed when a write command is received will be described. The microprocessor (simply referred to as CPU hereinafter) of the disk array subsystem 100 refers to the attribute managing table 201 of FIG. 10 in step S801 shown in FIG. 13. Subsequently, it is judged whether the received command is a command for the master volume or a command for the snapshot volume (FIG. 13: step S802).
Note here that the CPU is structured to function as an arithmetic calculation processing module including the data duplication control module 104 and the address changing module 105.
When judged in step S802 described above that it is a write command for the snapshot volume, the processing is ended without writing the data. This is the processing executed according to an operation with which the duplication of the master volume is to be maintained at the point where the snapshot volume receives the snapshot command, e.g., processing of a case of backup, for example. In other operation modes, the data may be written to the snapshot volume as requested by the write command.
In the meantime, when judged in step S802 that it is a write command for the master volume 101, the CPU then refers to the volume correspondence managing table 202 and specifies the snapshot volume 102 that makes a pair with the master volume (FIG. 13: step S803).
Then, it is judged whether or not there is data in the write request address of the specified snapshot volume 102 by referring to the difference managing table 203 (FIG. 13: steps S804, S805). When judged that there is data in the snapshot volume 102 (FIG. 13: step S805/Yes), the data is written to the master volume in step S509 and the processing is ended (FIG. 13: step S810).
When judged that there is no data in the snapshot volume 102 (FIG. 13: step S805/No), the allotment managing table 302 is then searched (FIG. 13: step S806), and a region to be used this time is determined from the unused regions of the shared pool volume 103.
Then, the existing data at the write request address of the master volume 101 is copied to the unused region of the shared pool volume (FIG. 13: step S807). Thereafter, value “1” indicating that the corresponding section of the allotment managing table 302 is being used is set, and the address of the unused region is set in the corresponding section of the directory 301 (FIG. 13: step S808).
Subsequently, value “1” indicating that there is data is set in the corresponding section of the difference managing table 203 (FIG. 13: step S809), the data is written to the master volume in step S509, and the processing is ended (FIG. 13: step S810).
Note here that FIG. 9, FIG. 10, and FIG. 11 show the states of each of the tables after a write command of (ZC) is issued to page 2 of the master volume (LV0) after receiving the snapshot command.
Next, the processing procedure to be executed when a read command is received will be described.
The CPU first refers to the attribute managing table 201 of the control memory 200 (FIG. 14: step S901), and judges whether the received command is for the master volume 101 or for the snapshot volume 102 (FIG. 14: step S902).
Then when judged in step S902 that it is the command for the master volume 101, the data is read out from the master volume 101 and the processing is ended (FIG. 14: steps S907, S908). In the meantime, when judged in step S902 described above that it is the command for the snapshot volume 102, the difference managing table 203 is then referred (FIG. 14: step S903) and it is judged in step S603 whether or not there is data in the readout request address of the snapshot volume 102 (FIG. 14: step S904).
When judged that there is the data in the snapshot volume 102, the directory 301 of the address changing memory 300 is referred (FIG. 14: step S906), and the address on the shared volume 103 at which the data is stored is acquired.
Subsequently, the data is read out from the shared volume 103 and transferred to the host, and the processing is ended (FIG. 14: steps S907, S908).
Further, when judged in step S902 described above that there is no data in the snapshot volume 102, the volume correspondence managing table 202 of the control memory 200 is referred in step S907, the data is read out from the master volume 101 that makes a pair with the snapshot volume 102, the readout data is transferred to the host in step S908, and the processing is ended. Here, the case of the snapshot method using the shared pool 103 is employed for the explanations. Other than such case, there is a method with which the snapshot volume 102 is created not with a virtual volume but with a normal volume, and the data is retracted to the snapshot volume without using the shared pool 103.
In such case, it is unnecessary to change the address from the virtual snapshot volume to the shared volume with this method. Thus, the processing can be simplified. However, when a plurality of snapshots are to be created, the disk capacity of several times larger than that of the master volume is used as in the case of mirroring. When executing backup with the snapshot method, it is necessary to determine the method by considering the advantages and disadvantages of those methods.
Further, other than Patent Document 1 described above, known as the related techniques are Patent Documents 2 and 3 mentioned below.
Out of those, the technique disclosed in Patent Document 2 (Japanese Unexamined Patent Publication 2002-373093) is known as a technique related to managing snapshot differences. The technique disclosed in Patent Document 3 (WO 2009/154272) is a version managing system targeted at image files regarding virtual machines, which is known as a system that makes it possible to refer to the differences between the versions.
The snapshot volume 102 acquired by the method disclosed in Patent Document 1 described above is used for various purposes. For example, as the used methods for backup, there is a method with which the duplication of the master volume updated as needed is acquired regularly and the image of the master volume 101 at a certain point is restored from the snapshot volume 102 as necessary. There is also a method with which the image of the master volume 101 at a certain point is backed up in a secondary backup device such as a magnetic tape through the snapshot volume 102.
Further, as a secondary use purpose of the data, there is also a use method with which a new development work is executed based on actual work data by utilizing the acquired snapshot volume 102. In the case of the secondary use, update may occur in the snapshot volume in some cases.
In the meantime, in a case where update occurs in the snapshot volume 102 while the generations of a plurality of snapshot volumes 102 are being managed, it takes time for the update processing in order to hold the snapshot volumes 102 between the generations.
Now, there is considered a case where data update occurs for page 2 of the snapshot volume LV12 in the state shown in FIG. 15.
When there is update of data “HH” occurred in page 2 of the snapshot volume LV12, it becomes necessary to execute processing for retracting the data of “CC” of page 2 of LV12 to page 2 of LV11 in order to hold the volume image of LV11. Therefore, the data update processing for the snapshot volume becomes degraded in terms of the performance compared to that of the update processing for the master volume.
Further, with the technique depicted in Patent Document 1 described above, it is unnecessary to change the address from the virtual snapshot volume to the shared volume. Thus, the processing is simplified. However, when a plurality of snapshots are to be created, the disk capacity of a plurality of times larger than that of the master volume is used as in the case of mirroring.
Furthermore, with each of Patent Documents 1 to 3 described above, there is always a load imposed upon the master volume especially when executing write command processing for the snapshot volume (the duplication volume) from the host.
It is therefore an exemplary object of the present invention to improve the inconveniences of the related techniques and, more specifically, to provide a data duplication system, a data duplication method, and a program thereof with which data update processing of the snapshot volume as a duplication volume can be executed smoothly without imposing a load on the master volume at the time of executing write command processing inputted from outside and with which the usability of the duplication volume can be improved thereby.
In order to achieve the foregoing object, the data duplication system according to an exemplary aspect of the invention is a snapshot-type data duplication system which includes: duplication volumes having a same capacity as that of a master volume; a shared pool volume which stores duplication data of the duplication volumes; and a main control unit which manages data accesses for the duplication data made from outside (a host computer) to the duplication volumes and manages a storing place of the duplication data, wherein the main control unit includes a data duplication control module which manages data accesses made from outside to the duplication volume, and an address changing module which manages the storing place of the duplication data.
Further, the data duplication control module includes a data duplication update processing function which duplicates data of the master volume to the duplication volume when updating the master volume, and directly updates the data of the duplication volume when updating the data of the duplication volume.
In order to achieve the foregoing object, the data duplication method according to another exemplary aspect of the invention is a data duplication method used in a snapshot-type data duplication system which includes duplication volumes for a master volume, a shared pool volume which stores duplication data of the duplication volumes, and a main control unit which manages data accesses for the duplication data made from outside (a host computer) to the duplication volumes and manages a storing place of the duplication data, and the method includes: judging whether or not update processing for the duplication volume is already executed by referring to an update managing table set in advance, when a write command for the duplication volume is inputted (an update processing judging step); specifying an address of the shared pool volume allotted in advance to the duplication volume as a target of the write command and writing data that corresponds to the write command thereto, when judged in the update processing judging step that there is a difference in the duplication volume (a shared volume writing step/a first step); and storing, to a prescribed memory, the address allotted on the shared volume that is a reference target of the update data of the duplication volume in a step of writing to the shared volume, and registering data showing that there is update data in the duplication volume to a corresponding section of the update managing table (an update data reference target registering step), wherein each processing content of judgment, writing, and registration done in each of the steps is executed by the main control unit successively.
In order to achieve the foregoing object, a data duplication method according to still another exemplary aspect of the invention is a data duplication method used in a snapshot-type data duplication system which includes duplication volumes for a master volume, a shared pool volume which stores duplication data of the duplication volumes, and a main control unit which manages data accesses for the duplication data made from outside (a host computer) to the duplication volumes and manages a storing place of the duplication data, and the method includes: judging whether or not update processing for the duplication volume is already executed by referring to an update managing table set in advance, when a write command for the duplication volume is inputted (an update processing judging step): searching a vacant region of the shared volume by referring to an allotment managing table set in advance, when judged in the update processing judging step that there is no difference (no update) in the duplication volume (a vacant region searching step); writing data that corresponds to the write command to the vacant region, when the vacant region of the shared pool volume is found in the vacant region searching step (a shared volume writing step/a second step); storing, to a prescribed memory, an address allotted on the shared volume that is a reference target of the update data of the duplication volume in the shared volume writing step, and registering data showing that there is update data in the duplication volume to a corresponding section of the update managing table (an update data reference target registering step), wherein each processing content of judgment, searching, writing, and registration done in each of the steps is executed by the main control unit. In order to achieve the foregoing object, the data duplication program according to still another exemplary aspect of the invention is a data duplication method is used in a snapshot-type data duplication system which includes duplication volumes for a master volume, a shared pool volume which stores duplication data of the duplication volumes, and a main control unit which manages data accesses for the duplication data made from outside (a host computer) to the duplication volumes and manages a storing place of the duplication data, and the program includes: an update processing judging function for judging whether or not update processing for the duplication volume is already executed by referring to an update managing table set in advance, when a write command for the duplication volume is inputted: a shared pool volume writing processing function (a first writing processing function) for specifying an address of the shared volume allotted in advance to the duplication volume as a target of the write command and writing data that corresponds to the write command thereto, when judged by the update processing judging function that there is a difference in the duplication volume; and an update data reference target registering processing function for storing, to a prescribed memory, the address allotted on the shared volume that is a reference target of the update data of the duplication volume by the shared pool volume writing processing function (the first processing function) and for registering data showing that there is update data in the duplication volume to a corresponding section of the update managing table, wherein each processing content of judgment, writing, and registration done in each of the processing functions is executed by a computer that is provided to the main control unit.
In order to achieve the foregoing object, the data duplication program according to still another exemplary aspect of the invention is a data duplication method is a data duplication processing program used in a snapshot-type data duplication system which includes duplication volumes for a master volume, a shared pool volume which stores duplication data of the duplication volumes, and a main control unit which manages data accesses for the duplication data made from outside (a host computer) to the duplication volumes and manages a storing place of the duplication data, and the program includes: an update processing judging function for judging whether or not update processing for the duplication volume is already executed by referring to an update managing table set in advance, when a write command for the duplication volume is inputted: a vacant region search processing function for searching a vacant region of the shared volume by referring to an allotment managing table set in advance, when judged by the update processing judging function that there is no difference (no update) in the duplication volume; a shared volume writing processing function (a second processing function) for writing data that corresponds to the write command to the vacant region, when the vacant region of the shared volume is found by the vacant region search processing function; and an update data reference target registering processing function for storing, to a prescribed memory, an address allotted on the shared volume that is a reference target of the update data of the duplication volume by the shared volume writing processing function, and for registering data showing that there is update data in the duplication volume to a corresponding section of the update managing table, wherein each processing content of judgment, searching, writing, and registration done in each of the processing functions is executed by a computer that is provided to the main control unit.