1. Field
Embodiments of the invention relate to using volume containers in replication and provisioning management.
2. Description of the Related Art
Disaster recovery systems address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. In the second type of gradual disaster, updates to volumes on data storage may be lost. Such data loss over a period of time is the common form of a site disaster because power does not degrade all at once, but, rather, may take several seconds to degrade across a computer system. While for a human this may appear instantaneous, for the computer system, the degradation of power may span several transactions, thus causing data corruption at a secondary site if care is not taken to keep the data consistent. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such redundant (“dual” or “shadow”) copies are typically made as the application system is writing new data to a primary control unit having storage made of primary volumes at a primary site. International Business Machines Corporation (IBM), the assignee of the subject patent application, provides several remote mirroring systems, including disaster recovery solutions, such as metro mirror (i.e., synchronous mirroring) and global mirror (i.e., asynchronous mirroring).
Merely for ease of illustration, the terms primary and secondary are used to refer to sites, control units or storage. Any site, control unit or storage (e.g., volume or cache) may be either a source or a target for purposes of data transfer or remote mirroring.
Remote mirroring systems are able to recover data updates that occur between a last, safe backup and a system failure. Such remote mirroring systems may also provide an additional remote copy for non-recovery purposes, such as local access at a remote site.
As an example, with a remote mirroring system, a primary control unit maintains a copy of data on a secondary control unit having storage made of secondary volumes. Changes to data at the primary control unit are copied to the secondary control unit as an application updates the data at the primary control unit. The changes may be made synchronously or asynchronously, depending on the type of remote mirroring system that is used.
Volumes in the primary and secondary control unit are consistent when all writes have been transferred in their logical order, i.e., all dependent writes transferred first before the writes dependent thereon. In a banking example, this may mean that a deposit is written to the secondary volume before a withdrawal. A consistency group may be described as a collection of related volumes that are kept in a consistent state. A consistency transaction set may be described as a collection of updates to the primary volumes such that dependent writes are secured in a consistent manner. Consistency groups maintain data consistency across volumes. For instance, if a failure occurs, the deposit will be written to the secondary volume before the withdrawal. Thus, when data is recovered from the secondary volumes, the recovered data will be consistent with the data at the primary control unit.
A point in time copy operation involves physically copying all the data from primary volumes to secondary volumes so that the secondary volume has a copy of the data as of a point in time. A point in time copy may also be made by logically making a copy of the data and then only copying data over when necessary, in effect deferring the physical copying. This logical copy operation is performed to minimize the time during which the secondary and primary volumes are inaccessible.
“Instant virtual copy” operations, also referred to as “fast replicate functions.” work by modifying metadata, such as relationship tables or pointers, to treat a primary data object as both the original and copy. In response to a host's (i.e., a server computer's) copy request, the control unit immediately reports creation of the copy without having made any physical copy of the data. Only a “virtual” copy has been created, and the absence of an additional physical copy is completely unknown to the host.
Later, when the storage system receives updates to the original or copy, the updates are stored separately and cross-referenced to the updated data object only. At this point, the original and copy data objects begin to diverge. The initial benefit is that the instant virtual copy occurs almost instantaneously, completing much faster than a normal physical copy operation. This frees the host and control unit to perform other tasks. The host or control unit may even proceed to create an actual, physical copy of the original data object during background processing, or at another time.
One such instant virtual copy operation is known as a FlashCopy® operation. A FlashCopy® operation involves establishing a logical point in time relationship between primary and secondary volumes on the same or different devices. The FlashCopy® operation guarantees that until a track in a FlashCopy® relationship has been hardened to its location on the secondary disk, the track resides on the primary disk. A relationship table is used to maintain information on all existing FlashCopy® relationships in the control unit. During the establish phase of a FlashCopy® relationship, one entry is recorded in the primary and secondary relationship tables for the primary and secondary that participate in the FlashCopy® being established. Each added entry maintains all the required information concerning the FlashCopy® relationship. Both entries for the relationship are removed from the relationship tables when all FlashCopy® tracks from the primary extent have been physically copied to the secondary extents or when a withdraw command is received. In certain cases, even though all tracks have been copied from the primary extent to the secondary extent, the relationship persists.
Control of replication sessions, especially if one for disaster recovery is involved, is error prone and time consuming because the environment is not stable. A replication session may be described as a session type and a group of copysets. A session type defines the type of replication to be performed across the copysets, for instance to perform a FlashCopy® operation or a synchronous remote copy. A copyset is a set of volumes that hold one logical copy of the data. There is one volume per copyset role, where the copyset roles are defined by the session type. For example, in a FlashCopy® session type, the copyset roles are source and target). Session types other than a FlashCopy® session type may require more complex copysets and use different roles. For example, for a disaster recovery session type, talking about a source and target volume is problematic. If it is said that site1 is the current production site with site2 being the current backup site, the copyset includes source volumes on site1 and target volumes on site2. However, if site1 fails and recovery uses site2, then the production is run on site2. Once site1 recovers, replication is run in the opposite direction (from site2 to site1), so then the replication session runs from target to source. So for disaster recovery type sessions, instead of source and target roles, roles such as hostsite1 and hostsite2 are used to designate the volumes that a host on site 1 or a host at site 2 would mount, respectively.
The lack of stability of the environment is due, for example, to capacity being added for applications. As capacity is added for the applications, additional capacity needs to be added for the redundant copies at the secondary control unit, and this additional capacity needs to be configured into the replication session so that primary data is mirrored at the secondary data. If the additional capacity is not added and configured, then, in the event of a site disaster, not all data may have been copied to the secondary control unit.
In conventional systems it is difficult to replicate the data in the same consistency group and avoid this problem. In particular, in some conventional systems, when a user adds storage at the primary control unit, the user has to determine how much storage is to be added at the secondary control unit and where the storage may be obtained. The user also has to associate the newly added storage at the primary and secondary control units with a replication session. Such manual processing is error prone and inefficient.
Thus, there is a need in the art for improved replication management.