In the field of computer storage systems, there is increasing demand for what have come to be described as “advanced functions”. Such functions go beyond the simple I/O functions of conventional storage controller systems. Advanced functions are well known in the art and depend on the control of metadata used to retain state data about the real or “user” data stored in the system. The manipulations available using advanced functions enable various actions to be applied quickly to virtual images of data, while leaving the real data available for use by user applications. One such well-known advanced function is FlashCopy®.
At the highest level, FlashCopy® is a function where a second image of ‘some data’ is made available. This function is sometimes known in other system contexts as Point-In-Time copy, or T0-copy. The second image's contents are initially identical to that of the first. The second image is made available ‘instantly’. In practical terms this means that the second image is made available in much less time than would be required to create a true, separate, physical copy, and that this means that it can be established without unacceptable disruption to a using application's operation.
Once established, the second copy can be used for a number of purposes including performing backups, system trials, and data mining. The first copy continues to be used for its original purpose by the original using application. Contrast this with backup without FlashCopy®, where the application must be shut down, and the backup taken, before the application can be restarted again. It is becoming increasingly difficult to find time windows where an application is sufficiently idle to be shut down. The cost of taking a backup is increasing. There is thus significant and increasing business value in the ability of FlashCopy® to allow backups to be taken without stopping the business.
FlashCopy® implementations achieve the illusion of the existence of a second image by redirecting read I/O addressed to the second image (henceforth Target) to the original image (henceforth Source), unless that region (also known as a “grain”) has been subject to a write. Where a region has been the subject of a write (to either Source or Target), then to maintain the illusion that both Source and Target own their own copy of the data, a process is invoked which suspends the operation of the write command, and without it having taken effect, issues a read of the affected region from the Source, applies the read data to the Target with a write, then (and only if all steps were successful) releases the suspended write. Subsequent writes to the same region do not need to be suspended since the Target will already have its own copy of the data. This copy-on-write technique is well known and is used in many environments.
All implementations of FlashCopy® rely on a data structure which governs the decisions discussed above, namely, the decision as to whether reads received at the Target are issued to the Source or the Target, and the decision as to whether a write must be suspended to allow the copy-on-write to take place. The data structure essentially tracks the regions or grains of data that have been copied from source to target, as distinct from those that have not. In its simplest form, this data structure is maintained in the form of a bitmap showing which grains have been written to, and which are untouched by write activity.
Some storage controllers allow a user to configure more than one target for a given source, also known as multiple target FlashCopy®. This has a number of applications. For instance, different experiments could be run against each of the targets. Or the targets might be taken at different times (e.g. different days in the week), and allow historical access to the disk, perhaps for the purpose of recovering from some data corruption, such as might be caused by a virus.
There are two categories of implementation for multiple target FlashCopy®, including:
In conventional implementations, a write to the source disk for an area that has not yet been copied will result in the data being copied to all of the target disks by reading the data from the source and then writing the data to each of the targets. In these implementations it will always be the case that a read I/O request submitted to a target disk can be satisfied by FlashCopy® reading data either from the source disk or the target disk depending on whether the data has previously been copied. It is never the case that to satisfy a read request from one target disk it is necessary to read data from another target disk. Such an arrangement is shown in FIG. 1, where A is a source LOGICAL UNIT, and B and C show two targets that were taken at some time in the past. A, B and C can each be updated. The arrows show grains (fixed sized regions of the disk) which are still dependent on the source LOGICAL UNIT. These have corresponding bits of ‘0b’ in the bitmap which tracks the progress of each FlashCopy® In cascade implementations, such as the multiple target FlashCopy® facility available with the IBM SAN Volume Controller (SVC), a write to the source disk for an area that has not yet been copied will result in the data being copied to just one of the target disks. For these implementations, a read I/O request submitted to a target disk may require FlashCopy® to read data from the source disk, the target disk or another target disk in the cascade depending on which source or target disks have previously been written to. Such an arrangement is shown in FIGS. 2A and 2B, where A and B are already in a FlashCopy® relationship, and C is added as a copy of A. At the point that the image C is established, the relationships can be arranged as shown in FIG. 2A. In effect, B is established as a copy of C (which is at this instant identical to A), and C is a copy of A. The bitmap held by B that described its differences from A also correctly describes its difference from C. C is identical to A, and has an empty bitmap. Updates to A now only require a copy operation to copy data from A to C. Conversely, updates to C require two copy operations, from A to C, and from C to B. This is because updates to the middle of a chain force a copy to the relationships on either side. For instance, taking the arrangement of FIG. 2A, if we apply updates to the first and fourth grains in A, and the second and 6th grains in C, the outcome is as depicted in FIG. 2B.
The key advantage of a cascaded implementation over a conventional implementation is that it reduces the overheads of FlashCopy® when processing write I/O requests by minimizing the number of disks that data has to be copied to. In particular the overheads of a cascaded implementation do not increase as the number of targets increases and consequently unlike traditional implementations it is possible to support a much greater number of targets.
The primary disadvantage of a cascaded implementation over a conventional implementation is that it generates additional dependencies between the target disks to be able to satisfy read requests. From one target disk a cascaded implementation may have to read data from another target disk in the cascade. Consequently, if a user wishes to stop or re-trigger a FlashCopy® mapping that is part of a cascade then it is first necessary to copy all the data that is required by other target disks in the cascade to another target disk. In contrast, a conventional solution does not have this problem. It is possible to stop or re-trigger a FlashCopy® mapping without ever having to first copy data from the target disk to another disk.
One example of a situation in which a FlashCopy® mapping may need to be stopped or re-triggered is one in which the members of the cascade are used for different levels of backup. A first-level backup, such as a daily backup, may need to be stopped or re-triggered without affecting a second level of backup, such as a weekly backup.
To allow a FlashCopy® mapping to be stopped or re-triggered, a cascaded implementation can introduce the concept of a map being in a “removing” state while the data that is required by other targets is being copied. Within SVC this process of copying this data is called cleaning. While it is in this “removing” state, the target disk of the map being removed cannot be accessed. This is in order to guarantee that the cleaning operation completes. Only when the target is clean can a map be stopped or re-triggered.
It would thus be desirable to have a multiple-target system in which the scalability of the cascade version could be combined with the flexibility of the conventional version.