1. Technical Field
This invention relates to a method of, and system for, handling a backup process.
2. Related Art
The storage of data in large organisations is of fundamental importance, both for reliability of the data and for the ability to recover data in the event of any hardware failure. Storage area network (SAN) is an architecture that is used when very large amounts of data are needed to be stored in a reliable and secure manner. This technology allows networks to be created that support the attachment of remote computer storage devices such as disk arrays to servers in such a way that, to the operating system, the devices appear as locally attached. It is common in these networks to include a large amount of redundancy, both in the data storage and in the hardware connections between the individual components.
Various methods exist for creating data redundancy. For example, a backup process such as a FlashCopy® function enables an administrator to make point-in-time, full volume copies of data, with the copies immediately available for read or write access. (FlashCopy is a registered trademark of International Business Machines Corporation in the United States and other countries) The FlashCopy can be used with standard backup tools that are available in the environment to create backup copies on tape. A FlashCopy function creates a copy of a source storage volume on a target storage volume. This copy, as mentioned above, is called a point-in-time copy. When a FlashCopy operation is initiated, a relationship is created between a source volume and target volume. This relationship is a “mapping” of the source volume and the target volume. This mapping allows a point-in-time copy of that source volume to be copied to the associated target volume. The relationship exists between this volume pair from the time that the FlashCopy operation is initiated until the storage unit copies all data from the source volume to the target volume, or until the relationship is deleted.
When the data is physically copied, a background process copies tracks (or “grains”) of data from the source volume to the target volume. The amount of time that the process takes to complete the background copy depends on various criteria, such as the amount of data being copied, the number of background copy processes that are running and any other activities that are presently occurring. The FlashCopy function works in that the data which is being copied does not actually need to be copied instantaneously, it only needs to be copied just prior to an update causing on overwrite of any old data on the source volume. So, as data changes on the source volume, the original data is copied to the target volume before being overwritten on the source volume.
Therefore, a FlashCopy is an example of a feature supported on various storage devices that allows a user or an automated process to make nearly instantaneous copies of entire logical volumes of data. A copy of a source disk is made on a target disk. The copies are immediately available for both read and write access. A common feature of FlashCopy like implementations is the ability to reverse the copy. That is, to populate the source disk of a FlashCopy map with the contents of the target disk. It is also possible to use backup processes such as FlashCopy in cascaded implementations, in which a target disk later becomes the source disk for a further FlashCopy or vice versa.
A cascaded configuration of storage volumes is described in detail in U.S. Pat. No. 7,386,695. It is also possible to create multiple cascades of storage volumes which are interlocking at a logical level. A first cascade may comprise storage volumes A, B, C and D which are arranged in a cascade as follows: A⇄B⇄C⇄D, while at a later time new backups of A may be started to volumes E and F that ultimately leads to the creation of a second cascade A⇄E⇄F. Many different combinations of FlashCopy functions and reversed functions are possible, potentially creating complicated multiple cascading storage volumes.
There are two types of point-in-time (PIT) backup processes commonly used in data storage systems, called a clone and a snapshot. A clone is a PIT copy where the target disk will hold a complete copy of the data that was on the source disk when the PIT copy was started. When the copying of data from source to target completes, the target disk is independent of the source. A snapshot is a PIT copy where the target only holds the changed data necessary to present the PIT copy of the source. Data is only copied to the target disk if it is changed on the source. The target disk is generally always dependent on some of the data on the source disk in order to present the PIT copy.
Multiple target cascaded copying is a technique implemented in the IBM SAN Volume Controller FlashCopy. A cascade is used to implement multiple PIT copies of a single data source. For example, with a data source S and PIT copies of S taken at times t1, t2 and t3, then at time t1 there is taken a PIT copy using data target T1 resulting in a cascade: S→T1. Then at time t2 there is taken a second PIT copy using data target T2 and resulting in the cascade: S→T2→T1. This arrangement works because if data has been changed on T1 or S between times t1 and t2 the data will be on T1 and if the data has not been changed then both T1 and T2 want to read the same data. Similarly at t3 there is produced the cascade: S→T3→T2→T1.
This technique has many benefits. However, it also introduces dependencies between the data targets that would not exist in a traditional multiple target implementation. A side effect of this target dependency is the requirement to clean a target when a PIT copy is stopped or completes. For example, if PIT copy S→T2 is stopped, any data on T2 that is required by T1 must be copied from T2 to T1 before the target T2 can be removed from the cascade. In many situations this is not a problem, because the user may wish T1 to hold a complete copy of S at time t1, meaning that the backup process S→T1 is a clone. However, if the intention of S→T1 is just to produce a snapshot of S at time t1, this extra copying from T2 to T1 may cause the user problems. Further, if the data target T1 was thinly provisioned (also known as space efficient), the problems caused by the above behaviour may cause the unnecessary allocation of storage to T1. This would seriously reduce the user's ability to maintain snapshots and clones and to manage their backups.
There are a number of existing techniques that attempt to reduce the amount of data that is copied from T2 to T1, with varying degrees of success. There is no solution that can produce the minimal number of copies from T2 to T1, without dramatically increasing the amount of metadata used to track the contents of the various data targets.
It is therefore an object of the invention to improve upon the known art.