1. Field
Embodiments of the invention relate to data set version counting in a mixed local storage and remote storage environment.
2. Description of the Related Art
In conventional systems, when a data set resides at a local computing device, it may be useful to maintain a backup copy of the data set at a remote backup system (e.g., for data recovery in case of loss of the data at the local computing device).
Backup and storage management products offer sets of policy constructs to dictate how data is managed. A policy construct may be described as a rule. One of these policy constructs dictates how many copies of a backup data set is to be maintained at the remote backup system for a data set that exists in local storage at the local computing device (as opposed to a data set that has been deleted on a computer, for which there are other policy constructs). For example, a user may specify that three versions of a data set should be maintained in remote storage at the remote backup system for a given existing data set in the local storage. The remote backup system manages the remote storage such that when a fourth data set is received, the remote backup system discards the first version of the data set (effectively keeping the second, third, and fourth versions of the data set).
In conventional systems, a backup data set may be generated outside of the context of the remote backup system. These may be referred to as “local backup operations”. For example, software or hardware snapshots of data may be generated by products that are not necessarily integrated into the remote backup system, such as a third-party snapshot software packages that may generate snapshots of data on the local computing device or hardware systems that do so (e.g., IBM(® Enterprise Storage Server® (ESS) system available from International Business Machines Corporation). Further details of a snapshot operation are disclosed in U.S. Pat. No. 5,410,667 entitled “Data Record Copy System for a Disk Drive Array Data Storage Subsystem,” which issued on Apr. 25, 1995, which patent is incorporated herein by reference in its entirety. Counting versions of data sets with these techniques differs from counting versions of data sets at a remote backup system.
First, in a traditional remote backup system, the remote storage is opaque to the local computing device. That is, the local computing device assumes that there is sufficient remote storage to store at least the number of versions that the user at the local computing device wants to manage. Also, in this type of remote backup system out of storage conditions are exceptions. In fact, the remote backup system may keep more versions than the user requests for a period of time. For example, if the user specifies to keep three versions of a data set and a fourth version of the data set is sent to the remote backup system, the remote backup system may keep all four versions subject to an expiration process that deletes the unwanted versions of data sets. The expiration process may be described as asynchronous as it has no time dependency on adding the fourth, fifth, sixth, etc. data version. Furthermore, if the data set backup is more frequent than the expiration process, there may be even more versions of backup data sets queued up to be expired from the remote backup system.
Local backup data sets may represent terabytes of data. If a user has a data set which is 100 Gigabytes (GB) in size and has a policy construct which manages three versions of the data set, the user technically needs to have at least four 100 GB storage containers available. A storage container may be described as a logical portion of the remote storage (i.e., a set of physical disk space) used by the remote backup system to store a data set version. After three storage containers are filled with three versions of local backup data sets, the fourth storage container is used to house the fourth version, after which the storage container for the first version may be released.
An alternate approach to this problem is for the local computing device to delete one of the three backup data sets before the backup of the fourth data set. This alleviates the need for the user to have four storage containers to house three versions of a backup data set. The drawback to this alternate approach is that, while the fourth backup operation is executing (which could take several hours), the user only has two versions of the backup data set from which to perform a restore operation, which does not adhere to the policy construct. Also, if the backup of the fourth data set were to fail, the user would be left with two versions of the backup data set. In this type of remote backup system, out of storage conditions are normal.
A number of direct access storage device (DASD) subsystems are capable of performing “instant virtual copy” operations, also referred to as “fast replicate functions.” Instant virtual copy operations work by modifying metadata, such as relationship tables or pointers, to treat a source data object as both the original and copy. In response to a copy operation request received at a storage controller (which provides access to storage), creation of the copy is reported without having made any physical copy of the data. Only a “virtual” copy has been created, and the absence of an additional physical copy is completely unknown to the originator of the copy operation request.
Later, when the storage controller receives updates to the original or copy, the updates are stored separately and cross-referenced to the updated data object only. At this point, the original and copy data objects begin to diverge. The initial benefit is that the instant virtual copy occurs almost instantaneously, completing much faster than a normal physical copy operation. This frees the storage controller to perform other tasks. The storage controller may even proceed to create an actual, physical copy of the original data object during background processing, or at another time.
One such instant virtual copy operation is known as a FlashCopy® operation. Further details of an instant virtual copy operation are described in U.S. Pat. No. 6,611,901, issued on Aug. 26, 2003, and entitled “Method, System, and Program for Maintaining Electronic Data as of a Point-in-Time”, which patent is incorporated herein by reference in its entirety.
A second problem with the traditional technique of counting backup versions is that it assumes that the backups of the versions have been completed and committed to the remote backup system. For large backups of local data sets (e.g., a FlashCopy® backup operation), the local storage may be in use at the time a subsequent backup operation is performed. For example, the user may currently have three local containers and two versions of local FlashCopy® data and may initiate a local FlashCopy® backup operation, which takes four hours. At this point in time, two of the local storage containers are used for valid data set versions and the third storage container is allocated to the current process. At some time two hours later (“time t1”), the user initiates another backup operation. At this point, the user has to re-use one of the two local storage containers to dedicate to this backup operation. This leaves one local storage container with valid backup data and two local storage containers dedicated to backup operations. If both of the backup operations were to fail, the user would be left with one backup data set. If the user were to try to restore data at this point in time, there would be only one valid data set from which to restore, which is not what the user intended when the policy was set up to keep three versions of data.
A third problem is that the end user has to explicitly set up the local storage correctly to match the number of versions that the remote backup system manages. For example, if the user sets up a policy to manage three versions v(1) on the remote backup system, the user must ensure that there is sufficient local storage s(1) to store the three versions v(1). If the user has additional local storage s(1), the additional local storage may be used to alleviate problems in the examples above, but if the user only has enough local storage to store the desired number of versions (s(1)=v(1)), another set of rules would apply when taking backups after all of the local storage has been used.
A fourth problem is that different backup technologies may be used on the same pool of local storage. For example, instant virtual copy and incremental virtual copy may be used on the same pool of local storage. Further details of an incremental virtual copy operation are described in U.S. patent application Ser. No. 10/465,118, and entitled “Method, System, and Program for Incremental Virtual Copy”, filed on Jun. 18, 2003, which patent application is incorporated herein by reference in its entirety. An incremental virtual copy operation is an enhancement to an instant virtual copy operation. With the incremental virtual copy operation, only the blocks of data that were updated on source and target volumes since the last copy operation (e.g., instant virtual copy operation) from the source volume to the target volume are copied.
For example, assume that a user has three sets of local storage containers, sc(1) and wants to take a full instant virtual copy backup at 12:00 and incremental virtual copy backups every two hours. Therefore, storage container sc(1)1 is used for the full backup at 12:00. At 2:00, storage container sc(1)2 is used for the incremental virtual copy. At 4:00, storage container sc(1)2 is re-used for the incremental virtual copy (in situations in which there may only be a single set of incremental virtual copy relationships). Storage container sc(1)3 is not used until 12:00 when another full instant virtual copy is taken. The fact that incremental virtual copy relationships are mixed in with full instant virtual copy operations makes it even more difficult to explain to the user how the version counting is used in the mixed-mode environment.
Thus, the traditional techniques of counting backup versions correlates to the number of versions that the user may want to restore at any point in time. When introducing local storage that stores versions of data sets that are managed outside of the remote backup system, it becomes difficult to instruct the user on how much storage space needs to be allocated in local storage to assure that a desired number of backup versions may be restored at any given point in time.
Thus, there is a need in the art for improved data version counting in a mixed local storage and remote storage environment.