This invention relates to a method of, and system for, facilitating backup processes.
Storage area network (SAN) is an architecture that is often used when very large amounts of data are to be stored in a reliable and secure manner. This technology allows networks to be created that support the attachment of remote computer storage devices such as disk arrays to servers in such a way that, to the operating system, the devices appear as locally attached. It is common in these networks to include a large amount of redundancy, both in the data storage and in the hardware connections between the individual components.
Various methods exist for creating data redundancy. For example, a function such as a FlashCopy® function enables an administrator to make point-in-time, full volume copies of data, with the copies immediately available for read or write access. (FlashCopy is a registered trademark of International Business Machines Corporation in the United States and other countries.) The FlashCopy® can be used with standard backup tools that are available in the environment to create backup copies on tape. A FlashCopy® function creates a copy of a source volume on a target volume. This copy, as mentioned above, is called a point-in-time (PIT) copy. When a FlashCopy® operation is initiated, a relationship is created between a source volume and target volume. This relationship is a “mapping” of the source volume and the target volume. This mapping allows a point-in-time copy of that source volume to be copied to the associated target volume. The relationship exists between this volume pair from the time that the FlashCopy® operation is initiated until the storage unit copies all data from the source volume to the target volume, or the relationship is deleted.
FlashCopy is often used for creating recovery points that are application consistent point in time copies of the production data. These recovery points can then be used in the event of production data corruption. Because the production system is often of limited usefulness when data corruption occurs, the user frequently needs to be able to restore the production data immediately. Additionally users typically do not want to sacrifice any existing backups because restoring the production system may need to be re-triggered if mistakes are made when recovering the system.
When the data is physically copied, a background process copies tracks from the source volume to the target volume. The amount of time that it takes to complete the background copy depends on various criteria, such as the amount of data being copied, the number of background copy processes that are running and any other activities that are presently occurring. The FlashCopy® function works in that the data which is being copied does not actually need to be copied instantaneously, it only needs to be copied just prior to an update causing on overwrite of any old data on the source volume. So, as data changes on the source volume, the original data is copied to the target volume before being overwritten on the source volume. This copying operation is often referred to as a “copy write” and is part of a “cleaning” in which dependency of the target volume on the source volume is removed for the grain of data copied.
Therefore, a FlashCopy® is a feature supported on various storage devices that allows a user or an automated process to make nearly instantaneous copies of entire logical volumes of data. A copy of a source disk is made on a target disk. The copies are immediately available for both read and write access. A common feature of FlashCopy® like implementations is the ability to reverse the copy. That is, to populate the source disk of a FlashCopy® map with the contents of the target disk, typically in a restore operation.
There are two types of point-in-time (PIT) backup processes commonly used in data storage systems. One is called a clone and the other a snapshot. A clone is a PIT copy where the target disk will hold a complete copy of the data that was on the source disk when the PIT copy was started. When the copying of data from source to target completes, the target disk is independent of the source.
Conversely, a snapshot is a PIT copy where the target only holds the changed data necessary to present the PIT copy of the source. Data is typically only copied to the target disk if it is changed on the source. The target disk is generally dependent on some of the data on the source disk in order to present the PIT copy.
It is also possible to use FlashCopy® in cascaded implementations, in which a target disk later becomes the source disk for a further FlashCopy® or vice versa. A cascaded configuration of storage volumes is described in detail in U.S. Pat. No. 7,386,695.
A cascade may be used to implement multiple PIT copies of a single data source. For example, with a data source S and PIT copies of S taken at times t1, t2 and t3, then at time t1 there is taken a PIT copy using data target T1 resulting in a cascade: S→T1. Then at time t2 there is taken a second PIT copy using data target T2 and resulting in the cascade: S→T2→T1. This arrangement works because if data stored on T1 or S changes between times t1 and t2 the original data can still be found on T1. Alternatively, if the data has not been changed between times t1 and t2, then both T1 and T2 will contain or point to the same data. Adding a third backup at t3 produces the cascade: S→T3→T2→T1.
This technique has many benefits. However, it also introduces dependencies between the data targets that may not exist in a traditional multiple target implementation. A side effect of this target dependency can be a requirement to “clean” a target when a PIT copy is stopped or completes. For example, if PIT copy S→T2 is stopped, any data on T2 that is required by T1 is typically copied from T2 to T1 before the target T2 can be removed from the cascade. In many situations this is not a problem, because the user may wish T1 to hold a complete copy of S at time t1, meaning that the backup process S→T1 is a clone. However, if the intention of S→T1 is just to produce a snapshot of S at time t1, this extra copying from T2 to T1 may cause the user problems. Further, if the data target T1 was thinly provisioned (also known as space efficient), the problems caused by the above behavior may cause the unnecessary allocation of storage to T1. In some applications this may reduce the user's ability to maintain snapshots and clones and to manage their backups.
There are a number of existing techniques that attempt to reduce the amount of data that is copied in a cleaning from one volume to another, such as from T2 to T1, with varying degrees of success. However, many such solutions can dramatically increase the amount of metadata used to track the contents of the various data targets.
It is also possible to create multiple cascades of storage volumes which are interlocking at a logical level. For example, a first cascade may comprise storage volumes A, B, C and D which are arranged in a cascade as follows: ABCD, while at a later time a new backup of A may be started that ultimately leads to the creation of AEF. Many different combinations of FlashCopy® functions and reversed functions are possible, potentially creating complicated multiple cascading storage volumes.
In a traditional multiple target FlashCopy implementation the restoration process can be relatively straight forward. However such systems are frequently not scalable in terms of copy writes required for any host write. A cascaded multiple target implementation is usually scalable because the number of copy writes can be bounded wherein the bound may be independent of the number of FlashCopy's of the source volume. Thus, a cascaded approach is frequently desirable in situations when many recovery points are desired or anticipated. However, a cascaded approach can complicate restore operations.
For example, in order to keep track of such cascaded storage volumes and FlashCopy® functions it is preferable to provide a data structure that defines primary and secondary “fdisks”. An fdisk is a logical component that includes an index defining the storage volume to which the fdisk relates and providing links to the relevant maps that define the up and down directions of the FlashCopy® functions in a cascade. When a FlashCopy® function is created between a source volume and a target volume, primary fdisks are often created for each storage volume, unless a primary fdisk already exists for the target disk, in which case that existing fdisk for the target volume is converted to a secondary fdisk and a new primary fdisk is created. The advantage of using a data structure as defined by the fdisks is that the fdisks can be used to keep track of the input/output (IO) read and write accesses to different storage volumes within existing multiple cascades and direct data reads to the correct location within the cascade.
The use of the concept of fdisks allows a storage volume to appear in different FlashCopy® cascades concurrently. The more times that a disk appears in a cascade the more read and write IO operations in a cleaning operation (cleaning IOs) may be required at the FlashCopy® level before a host originated IO can be completed back to the host. For this reason the number of fdisks for each disk is typically limited (for example to 2). This can, in turn, limit the number of active FlashCopy® backup or restore operations that can be started in a cascade, whilst any existing maps are still active. One approach for addressing this problem is to collapse cascades containing fdisks from the same disk back into a single cascade. However, there are frequently undesirable limitations associated with operations which will permit reforming a single cascade. Two examples are given below.
In a first example, suppose there are disks A, B, C, D, E and F. There are created FlashCopy® maps A→B, A→C, A→D, A→E and A→F. If the system started the maps A→D, A→C and A→B in order then there will be the cascade A→B→C→D. After A→D completes, there will result a cascade A→B→C. If it is now discovered that disk A is corrupt and the administrator wishes to restore disk A from disk D, then there is created and started a map D→A, which results in cascades D→A and A→B→C. Before FlashCopy® D→A is completed, the administrator often wishes to continue making backups of disk A. Consequently, maps A→F and A→E may be started before D→A is completed. Thus, cascades D→A→E→F and A→B→C may result.
In this scenario, the administrator may be limited to only two fdisks for A (to limit IO cleaning operations) and thus not permitted to add another one for a new restore operation, until either A→B and A→C stop or complete, or D→A stops or D→A and A→E complete. If disk A were to again become corrupted, the administrator may not be permitted to restore disk A again until the above operations stop or complete.
However if there is a restriction imposed such that it is only permitted to write to the original source disk A, then when D→A completes it is possible to naturally return the cascade configuration to A→E→F→B→C, because the cleaning operations associated with writes to A result in B being independent of any grains changed since D→A was started. However, this read-only target restriction may mean that the user is not permitted to create FlashCopy® maps for development or test purposes, since those maps typically involve writing to the targets.
In a second example, suppose there are disks A, B, C, D, E and F, and there is created FlashCopy® maps A→B, B→C, B→D, B→E and B→F. Suppose further that A→B is incremental and disks C, D, E and F are Space Efficient vdisks. The user starts maps A→B and then B→C giving a cascade A→B→C. When A→B completes, a “split stop” occurs which leaves cascade B→C. In other words, disk C is independent of disk A, such that if disk A fails, then disk C is still available to the user. Now if the user starts A→B again, which completes quickly because it is incremental, it is also possible to start B→D. This gives cascades A→B→D and B→C. When A→B completes, the user again “split stops” it resulting in the cascade B→D→C. The split stop in the cascade upon completion of A→B, minimizes the number of fdisks and allows the user to again restart A→B, if needed. However, such operations frequently include the limitations that only disk A and/or disk B are written to by the host. As with the previous example, this means that the user frequently is not permitted to create FlashCopy® maps for development or test purposes, since such maps may require its target to be writable.