The present invention relates to data recovery.
Typically, in a computer system, measures are taken to ensure that data backup and recovery can be performed periodically in preparation for cases where data are lost due to a fault in the storage system, data corruption caused by a computer virus, an operational error performed by a user, and so on.
Furthermore, to utilize business enterprise management resources effectively and increase management efficiency, many business enterprises are currently constructing computer systems in which applications (database management systems (DBMS), file systems, and so on) that were originally run independently in the department of each basic operation are integrated using an ERP package (Enterprise Resource Planning package) or the like such that the applications can be operated in conjunction. More specifically, for example, information is shared by having the respective applications run in each department use the same storage system. In addition, by having the applications use each other's functions and processing results, efficient processing can be performed throughout the entire computer system. When the applications operate using each other's functions and processing results in this manner, the applications operate in conjunction.
When a fault occurs in a computer system having a plurality of applications that operate in conjunction, recovery must be performed while maintaining the consistency of the applications (the consistency of the data used by the applications).
A technique in which a backup operation is performed collectively on a storage area (data volume) used by various applications has been proposed as a backup technique for realizing this type of recovery (see Japanese Patent Application Publication 2005-196618, for example). In Japanese Patent Application Publication 2005-196618, it is disclosed that a storage system stores a data volume group (referred to as a “copy group”) on which the collective backup operation is to be performed, and that in response to a copy group-unit backup operation, the storage system backs up the data of the same point in time in all of the data volumes belonging to the copy group.
When recovery is performed using this backup technique, the data volumes belonging to the copy group are recovered to their condition at the same point in time, and therefore consistency between the applications is guaranteed.
A backup technique using journaling has been proposed as another technique (see Japanese Patent Application Publication 2004-252686, for example). In Japanese Patent Application Publication 2004-252686, it is disclosed that a snapshot (a logical image of a full backup, an incremental backup, or the like) of a logical group (to be referred to as “journal group” hereafter) constituted by one or more data volumes at a specific point in time is obtained, data written to the data volumes thereafter are stored as a journal (referred to as an “after journal”) in a journal volume associated with the journal group, and by applying a series of after journals in a sequence written in relation to the obtained snapshot, the data in all of the data volumes belonging to the journal group are restored to their condition at a specific point in time. This is an example of a technique usually known as “Continuous Data Protection” or by its abbreviation, “CDP”.
The data restored in this manner are completely restored to the same point in time, and by using this technique to perform recovery, the consistency between applications can be guaranteed.
Note that the snapshot serving as the object of journal application during recovery is known as a “base snapshot”.
To describe the problems with this technique, first the term “recovery” will be defined.
In a “recovery”, the following two processes are executed in sequence.
Firstly, the storage system or backup software performs processing to restore data at the backup time to a data volume. Hereafter, this process will be referred to simply as “restoration”. In relation to the technique of Japanese Patent Application Publication 2005-196618, for example, restoration specifically involves restoring the data in a data volume on the basis of a backup volume in the storage system. In relation to Japanese Patent Application Publication 2004-252686, for example, restoration involves restoring the data in a data volume on the basis of the base snapshot to data at the point in time when the base snapshot was captured, applying a journal to the data volume, and reconstructing data of a specific point in time.
Secondly, an application performs processing to check and modify the content of the restored data before resuming operations to ensure that the restored data are consistent with the application. Hereafter, this process will be referred to as a “consistency check”. When the application is a file system, the consistency check specifically involves processing (an FSCK (File System Consistency Check) or the like) to mount the volume and perform a data consistency check. When the application is a DBMS, the consistency check involves processing (processing known as crash recovery, instant recovery, or the like) to apply a log to the restored data and then cancel updates up to the time of the latest commit.
Returning to the problems of this technique, requests are typically made in relation to computer system recovery for reductions in the time needed for recovery (the recovery time) and increases in the speed with which operations can be resumed.
Further, there is a limit to the throughput of restoration processing. This limit has various causes such as a bottleneck in the performance of the CPU that controls the restoration processing, a bottleneck in the network or bus area during data transfer, and a bottleneck in access to a logical volume such as a data volume. Hence, when a plurality of volumes are restored in group units, the throughput is dispersed, causing an increase in the restoration processing time of each volume.
When performing recovery on a computer system in which a plurality of applications operate in conjunction and the restoration processing time of each volume increases as described above, the start timing of the consistency check of each application is delayed. As a result, the recovery time of each application increases, leading to an increase in the time required to resume the operations of a computer system in which a plurality of applications operate in conjunction.
Note that here, the meaning of the term “the time required to resume the operations of a computer system in which a plurality of applications operate in conjunction” differs according to the manner in which the applications cooperate. In a computer system in which close cooperation is performed such that the plurality of applications access each other's processing (functions and the like), the term signifies the time until the recovery of all applications is complete. Meanwhile, in a computer system in which a plurality of applications cooperate loosely, such as a work flow in which the results of processing performed independently by a single application are used by another application, the term signifies the time until the recovery of the application that will execute the processing to be performed immediately after the resumption of operations is complete. Further, when performing recovery, the user sometimes specifies the point in time to which the data are to be restored. The term “recovery point” is used to indicate the specified point in time. The recovery point may be specified by a method other than user specification.
The problems described above may also occur in a computer system in which a plurality of applications do not operate in conjunction, as well as a computer system in which a plurality of applications operate in conjunction.