The present invention relates to the management and maintenance of large computer systems and particularly to automated methods and apparatus for the movement of data (migration) from location in the system to another.
In 1983, International Business Machines Corporation of Armonk, N.Y. (IBM) described the requirements and capabilities needed in order to efficiently manage and maintain the storage in the modern large data center and to evolve toward automated storage management IBM's Data Facility/System Management Storage (DF/SMS) capabilities. In discussing device support and exploitation, IBM identified the requirement for the timely support and exploitation of new device technologies and the need for tools to simplify the movement of data (migration) to new subsystems.
A standard set of tools has been provided by IBM and other software developers that has allowed data to be automatically copied, archived, and restored. Evolutionary improvements have been made to these tools in the areas of performance, usability, and in some cases data availability; but a problem still exists in that the availability capabilities of these facilities have not kept pace with the availability requirements that exist in data centers. The storage administrator must be able to support the increasing demands of continuous 24 hour by 7day data availability.
There is an explosive growth in the need to store and have on-demand access to greater and greater pools of data. As capacity requirements skyrocket, data availability demands increase. These factors coupled with the need to control costs dictate that new RAID (Redundant Array of Independent Disks) storage technology be implemented. The dilemma faced by data center management is that the implementation of new storage technology is extremely disruptive and therefore conflicts with the need to maximize availability of the data. Therefore, an additional tool is required that will allow data to be nondisruptively relocated or migrated within the data center.
Essentially, a data migration facility provides the ability to "relocate" data from one device to another device. A logical relationship is defined between a device (the source) and another device (the target). The logical relationship between a source and target volume provides the framework for a migration. The data migration facility controls multiple concurrent migrations in a single group that is called a session. A migration is the process that causes the data on the source to be copied to the target.
The characteristics that are critical components of a transparent data migration facility include the following:
The facility is completely transparent to the end user and the application program. No application disruption is required in order to make program changes and the facility is dynamically and nondisruptively activated. PA1 The facility provides for full data access during the data migration. The data on the source volume is available to the end user for read and write access. PA1 The facility provides for a dynamic and nondisruptive takeover of the target volume, when the source and target volumes are synchronized. PA1 The migration facility must ensure complete data integrity. PA1 The migration facility should NOT be restricted to any control unit model type or device type. All devices in the data center should be able to participate in a migration and the facility should support a multiple vendor environment. PA1 Host application response time is not impacted because updates are reflected on the secondary volume asynchronously. The application does not have to wait for the new data to be copied to the secondary volume. The SDM reads data from the IBM 3990-6 "sidefile" and records the update on log files and then writes to the remote mirror on the recovery control unit. PA1 A common timer is required to insure updates are processed in the correct order and therefore target data integrity is guaranteed in a multiple system environment. PA1 No dynamic takeover is supported. Intervention is required in order to utilize the secondary copy. PA1 XRC is transparent to the end user and the application program. No application program changes are required and the facility is dynamically activated. PA1 The data on the source volume is available for read and write access during the XRC migration. PA1 XRC causes a disruption because XRC does NOT support a dynamic and nondisruptive takeover of the target volume when the source and target volumes are synchronized. The impact of this disruption can be expensive. All applications with data resident on the source volume must be disabled during the takeover process. PA1 XRC ensures complete data integrity through the use of the journaling data sets and a common timer. PA1 XRC is a relatively "open" facility and therefore supports a multiple vendor environment. Any vendor that supports the IBM 3990-6 XRC specification can participate as the sending or receiving control unit in an XRC session. Any vendor that supports the IBM 3990-3 or basic mode IBM 3990-6 specification can participate as the receiving control unit in an XRC session. PA1 XRC is complex to use and therefore is operationally expensive and resource intensive. PA1 PPRC is transparent to the end user and the application program. No application program changes are required and the facility is dynamically activated. PA1 The data on the source volume is available for read and write access during a PPRC migration. PA1 P/DAS apparently supports a dynamic and nondisruptive takeover of the target volume when the source and target volumes are synchronized. PA1 PPRC ensures complete data integrity because a write operation will not be signaled complete until the primary and the secondary IBM 3990 control units have acknowledged the update request. This methodology will elongate the time required to perform update operations. PA1 PPRC requires a proprietary link between two control units manufactured by the same vendor. For example, only IBM 3990-6 control units can participate in an IBM PPRC migration. Therefore PPRC does NOT support a multiple vendor environment. PA1 PPRC is complex to use and therefore is operationally expensive and resource intensive. PA1 a. SMDS is not transparent to the end user and the application program. Although no application program changes are required, the facility cannot be nondisruptively activated. All applications are deactivated so that the 5000 can be installed and attached to the host in place of the source subsystem. Specialized software is loaded into the Symmetrix 5000 to allow it to emulate the source subsystem and initiate the data migration. This disruption can last as long as an hour.
The State of the Industry
Migration facilities that exist today were primarily designed for disaster recovery or the facilities were meant to address single volume failures. However, these facilities can also be used as data migration tools. These facilities differ in implementation and use a combination of host software and/or control unit firmware and hardware in order to provide the foundation for the data migration capability.
Local Mirroring
The IBM 3990 host extended feature IBM Dual Copy and the EMC Symmetrix, from EMC Corporation (EMC), mirroring feature are two examples of local mirrors. A source volume and a target volume are identified as a mirrored paired and at the creation of the mirrored pair, data is transparently copied or migrated to the secondary volume. Continuous read and write access by applications is allowed during the data migration process and all data updates are reflected to the secondary volume.
In the case of the IBM 3990 host, the mirror is under the complete control of the system operator. For example, through the use of system commands a mirrored pair can be created. At create time, the data will be copied to a secondary device. At the completion of this copy, the operator can then disconnect the pair and assign the secondary device to be the primary. This is called Transient Dual Copy and is an example of Dual Copy being used as a migration facility.
The function of the EMC mirroring feature is to maximize data availability. The EMC subsystem will disconnect the mirror in the event of a failure of one of the paired devices. The mirror will be automatically reconstructed when the failing device is repaired by EMC field engineers. Unlike Dual Copy, the EMC subsystem does not provide an external control mechanism to enable the user to selectively initiate and disconnect mirrored pairs. Therefore, the EMC mirroring feature can not be used as a migration facility.
Standard mirroring has a major restriction that prevents its universal utilization as a transparent data migration tool. The source and target volumes must be attached to the same logical control unit and therefore data can only be relocated within a single control unit. Although limited, this capability is an important tool to the storage administrator.
Remote Mirroring
IBM 3990-6 and EMC Symmetrix features support remote mirroring. A remote mirror function exists when paired devices can exist on different control units and subsystems. The primary objective of this function is to provide a disaster recovery capability. However, a remote mirroring function can also be used as a data migrator.
DF/SMS eXtended Remote Copy (XRC) is a host-assisted remote mirror method that uses components of DF/SMS and DFP (Data Facility Product). The major component is the System Data Mover (SDM). This component is also used for Concurrent Copy. An IBM 3990-6 (or compatible) host is required as the primary or sending control unit. The secondary or receiving control unit can be an IBM 3990-3 or -6 or compatible host.
Other characteristics of XRC include:
To invoke XRC as a data migration facility, the following steps are required. After identification of the source and target pair of devices, an image copy begins and the session is placed in a "duplex pending" state. All users of the source volume have total read and write access to the data. Source updates are reflected on to the target volume. When the copy is complete, the session enters the "duplex" state. The operator must query the pair in order to determine this state change. At this time, all applications using the source volume must be brought down in order to synchronize the source and target devices. Final synchronization is determined by operator command (XQUERY). This query displays a timestamp that represents the time of last update so that the operator can be assured that all updates have been reflected on the target.
Although XRC does have some of the requirements for transparent migration, XRC does not have all of them.
The IBM 3990-6 host also supports a feature that is called Peer-to-Peer Remote Copy (PPRC). PPRC is host independent and therefore differs from XRC in several ways. First, there is a direct ESCON (Enterprise Systems Connection) fiber link from one IBM 3990-6 host to another IBM 3990-6 host. With this fiber link connection, the primary IBM 3990 host can directly initiate an I/O Input/Output operation to the secondary IBM 3990 host. Secondly, PPRC operates as a synchronous process which means that the MVS (Multiple Virtual Systems) host is not informed of I/O completion of a write operation until both the primary and secondary IBM 3990 host control units have acknowledged that the write has been processed. Although this operation is a cache-to-cache transfer, there is a performance impact which represents a major differentiator of PPRC over XRC. The service time to the user on write operations for PPRC is elongated by the time required to send and acknowledge the I/O to the secondary IBM 3990 host.
The link between the IBM 3990 host controllers utilize standard ESCON fiber but does require an IBM proprietary protocol for this cross controller communication. This proprietary link restricts the use of PPRC to real IBM 3990-6 host controllers only and therefore does not support a multiple vendor environment.
As suggested above, PPRC can also be used as a migration facility. PPRC requires a series of commands to be issued to initiate and control the migration and is therefore resource intensive. IBM has a marketing tool called the PPRC Migration Manager that is used to streamline a migration process with the use of ISPF (Interactive Structured Program Facility) panels and REXX (Restructured Extended Executor) execs.
A migration using PPRC (Release 1) does not support an automatic takeover to the secondary device. In March of 1996, IBM announced an enhancement to PPRC called P/DAS, PPRC Dynamic Address Switch, which apparently when available eliminates the requirement to bring down the applications in order to perform the takeover of the target device. Therefore, P/DAS may allow I/O to be dynamically redirected to the target volume when all source data has been copied to that device.
Use of P/DAS is restricted to IBM 3990-6 controllers and is supported only in an MVS/ESA (Multiple Virtual Systems/Enterprise Systems Architecture) 5.1 and DFSMS/MVS 1.2 environment. Therefore the enhancement offered by P/DAS is achieved at the cost of prerequisite software. Furthermore, the dynamic switch capability is based on the PPRC platform and therefore supports only a IBM 3990-6 environment.
Although PPRC does have some of the requirements for transparent migration, PPRC does not have all of them.
EMC Corporation's remote mirroring facility is called Symmetrix Remote Data Facility (SRDF). The SRDF link is proprietary and therefore can only be used to connect dual Symmetrix 5000 subsystems.
SRDF has two modes of operation. The first is a PPRC-like synchronous mode of operation and the second is a "semi-synchronous" mode of operation. The semi-synchronous mode is meant to address the performance impact of the synchronous process. In the synchronous mode, the host is signaled that the operation is complete only when both the primary and the secondary controllers have acknowledged a successful I/O operation. In the semi-synchronous mode, the host is signaled that the operation is complete when the primary controller has successfully completed the I/O operation. The secondary controller will be sent the update asynchronously by the primary controller. No additional requests will be accepted by the primary controller for this volume until the secondary controller has acknowledged a successful I/O operation. Therefore in the SRDF semi-synchronous mode, there may one outstanding request for each volume pair in the subsystem.
EMC personnel must be involved in all SRDF activities unless the user has installed specialized host software that is supplied by EMC. The proprietary nature of SRDF restricts its use as a data migration facility. The primary function of SRDF is to provide data protection in the event of a disaster.
Late in 1995, EMC announced a migration capability that is based on the SRDF platform. This facility allows a Symmetrix 5000 to directly connect to another vendor's subsystem. The objective of the Symmetrix Migration Data Service (SMDS) is to ease the implementation of an EMC subsystem and is not meant to be a general purpose facility. SMDS has been called the "data sucker" because it directly reads data off another control unit. The data migration must include all of the volumes on a source subsystem and the target is restricted to a Symmetrix 5000.
An EMC Series 5000 subsystem is configured so that it can emulate the address and control unit type and device types of a existing subsystem (the source). This source subsystem is then disconnected from the host and attached directly to the 5000. The 5000 is then attached to the host processor. This setup is disruptive and therefore does cause an application outage.
The migration begins when a background copy of the source data is initiated by the 5000 subsystem. Applications are enabled and users have read and write access to data. When the target subsystem (the 5000) receives a read request from the host, the data is directly returned if it has already been migrated. If the requested data has not been migrated, the 5000 will immediately retrieve the data from the source device. When the target subsystem receives a write request, the update is placed only on the 5000 and is not reflected onto the source subsystem. This operation means that updates will cause the source and target volumes to be out of synchronization. This operation is a potential data integrity exposure because a catastrophic interruption in the migration process will cause a loss of data for any volume in the source subsystem that has been updated.
Although the Symmetrix Migration Data Service does have some of the requirements for transparent migration, SMDS does not have all of them.
1. The data on the source volume is available for read and write access during a SMDS migration. PA2 2. SMDS may support a dynamic and nondisruptive takeover of the target volume when the source and target volumes are synchronized. At the end of the migration, the source subsystem must be disconnected and the migration software must be disabled and it is unknown whether this is disruptive and an outage is required. PA2 3. SMDS can link to control units manufactured by other vendors. However, the purpose of SMDS is to ease the disruption and simplify the installation of an EMC 5000 subsystem. Data can only be migrated to an EMC subsystem. Therefore SMDS does NOT support a multiple vendor environment.
SMDS does NOT ensure complete data integrity. During the migration, data is updated on the target subsystem and is not reflected on the source subsystem. A catastrophic error during the migration can cause the loss of all application updates.
The State of the Art in a Summary
The products that are available on the market today do not meet all of the data migration requirements.
The maintenance of continuous data availability is a fundamental mission of data centers. In order to support this goal, the migration facility must be initiated transparent to all applications and provide a means for the nondisruptive takeover of the target device.
The value of data and information is critical to the operation and competitiveness of the enterprise and therefore any exposure to possible data loss is unacceptable.
The control of the costs of the ever expanding storage assets is a large component in financing an information technology infrastructure. A competitive multiple vendor environment provides a mechanism to support effective cost management. Therefore, the migration facility should be vendor independent. Accordingly, there is a need for improved data migration methods and apparatus which have all the data migration requirements.