The present invention is related to data storage systems and in particular to failover processing and data migration.
A multitude of storage system configurations exist to provide solutions to the various storage requirements of modern businesses.
A traditional multipath system shown in FIG. 7 shows the use of multipath software to increase accessibility to a storage system. A host 0701 provides the hardware and underlying system software to support user applications 070101. Data communication paths 070301, 070303 provide an input-output (I/O) path to a storage facility 0705 (storage system). Multipath software 070105 is provided to increase accessibility to the storage system 0705. The software provides typical features including failover handling for failures on the I/O paths 070301, 070303 between the host 0701 and the storage system 0705.
In a multipath configuration, the host 0701 has two or more Fibre Channel (FC) host bus adapters 070107. The storage system 0705, likewise, includes multiple Fibre Channel interfaces 070501, where each interface is associated with a volume. In the example shown in FIG. 7, a single volume 070505 is shown. A disk controller (070503) handles I/O requests received from the host 0701 via the FC interfaces 070501. As noted above, the host has multiple physically independent paths 070301, 070403 to the volume(s) in the storage system 0705. Fibre Channel switches which are not shown in the figure can be used for connecting the host and the storage system. It can be appreciated of course that other suitable communication networks can be used; e.g., Ethernet and InfiniBand.
In a typical operation, user applications 070101 and system software (e.g., the OS file system, volume manager, etc.) issue I/O requests to the volume(s) 070505 in the storage system 0705 via SCSI (small computer system interface) 070103. The multipath software 070105 intercepts the requests and determines a path 070301, 070303 over which the request will be sent. The request is sent to the disk controller 070503 over the selected path.
Path selection depends on various criteria including, for example, whether or not all the paths are available. If multiple paths are available, the least loaded path can be selected. If one or more paths are unavailable, the multipath software selects one of the available paths. A path may be unavailable because of a failure of a physical cable that connects a host's HBA (host bus adapter) and a storage system's FC interface, a failure of an HBA, a failure of an FC interface, and so on. By providing the host with the host 0701 with multiple physically independent paths to volumes in the storage system 0705, multipath software can increase the availability of the storage system from I/O path's perspective.
Typical commercial systems include Hitachi Dynamic Link Manager™ by Hitachi Data Systems; VERITAS Volume Manager™ by VERITAS Software Corporation; and EMC PowerPath by EMC Corporation.
FIG. 8 shows a storage system configured for data migration. Consider the situation where a user on the host machine 1301 has been accessing and storing data in a storage system A 1305; e.g., Volume X-P 130505. Suppose the user now wants to use the volume designated as Volume X-S 130705 on storage system B 1307. The host machine 1301 therefore needs to subsequently access Volume X-S.
To switch the host machine 1301 over to storage system B 1307, the data stored in Volume X-P needs to be migrated to Volume X-S (the assumption is that Volume X-S does not have a copy of the data on Volume X-P). In addition, a communication channel from the host machine 1301 to storage system B 1307 must be provided. For example, physical cabling 130301 that connects the host machine 1301 to storage system A 1305 needs to be reconnected to storage system B 1307. The reconnected cable is shown in dashed lines 130303.
Data migration from storage system A 1305 to storage system B 1307 is accomplished by the following steps. It is noted here that some of all of the data in storage system A can be migrated to storage system B. The amount of data that is migrated will depend on the particular situation. First, the user must stop all I/O activity with the storage system A 1305. This might involve stopping the user's applications 130101, or otherwise indicating to (signaling) the applications to suspend I/O operations to storage system A. Depending on the host machine, the host machine itself may have to be shut down. Next, the physical cabling 130301 must be reconfigured to connect the host machine 1301 to storage system B 1307. For example, in a fibre channel (FC) installation, a physical cable is disconnected from the FC interface 130501 of storage system A and connected to the FC interface 130701 of storage system B. Next, the host machine 1301 must be reconfigured to use Volume X-S in storage system B instead of Volume X-P in storage system A.
On the storage system side, the data in Volume X-P must be migrated to Volume X-S. To do this, the disk controller 130703 of storage system B initiates a copy operation to copy data from Volume X-P to Volume X-S. The data migration is performed over the FC network 130505. Once the data migration is under way, the user applications 130101 can once again resume their I/O activity, now with storage system B, where the migration operation continues as a background process. Depending on the host machine, this may involve restarting (rebooting) the host machine.
If the host machine 1301 makes a read access of a data block on Volume X-S that has not yet been updated by the migration operation, the disk controller B 130703 accesses the data of the requested data block from storage system A. Typically, the migration takes place on a block-by-block basis in sequential order. However, a read operation will likely access a block that is out of sequence with respect to the sequence of migration of the data blocks. The disk controller B can use a bitmap (or some other suitable mechanism) to keep track of which blocks have been updated by the migration operation and by the write operations. The bitmap can also be used to prevent a newly written block location from being over-written with data from Storage System A during the data migration process.
Typical commercial systems include Hitachi On-Line Data Migration by Hitachi Data Systems and Peer-to-peer Remote Copy (PPRC) Dynamic Address Switching (DAS) by IBM, Inc.
FIG. 9 shows a conventional server clustering system. Clustering is a technique for increasing system availability. Thus, host systems 0901 and 0909 each can be configured respectively with suitable clustering software 090103 and 090903, to provide failover capability among the hosts.
In a server cluster configuration, there are two or more physically independent host systems. There are two or more physically independent storage systems. FIG. 9, for example, shows that Host 1 is connected to storage system A 0905 over an FC network 090301. Similarly, Host 2 is connected to storage system B 0907 over an FC network 090309. Storage system A and storage system B are in data communication with each other over yet another FC network 090305. Although it is not shown, it can be appreciated that the network passes through a wide area network (WAN), meaning that Host 2 and storage system B can be located at a remote data center that is far from Host 1 and storage system A.
Under normal operations, Host 1 accesses (read, write) Volume X-P in storage system A. The disk controller A 090503 replicates data that is written to Volume X-P by Host 1 to Volume X-S in storage system B. The replication is performed over the FC network 090305. The replication can occur synchronously, in which case the storage system A does not acknowledge a write request from the Host 1 until it is determined that the data associated with the write request has been replicated to storage system B. Alternatively, the replication can occur asynchronously, in which case storage system A acknowledges the write request from Host 1 independently of when the data associated with the write request is replicated to the storage system B.
When a failure in either Host 1 or in storage system A occurs, failover processing takes place so that Host 2 can take over the tasks of Host 1. Host 2 can detect a failure in Host 1 by using a heartbeat message, where Host 1 periodically transmits a message (“heartbeat”) to the Host 2. A failure in Host 1 is indicated if Host 2 fails to receive the heartbeat message within a span of time. If the failure occurs in the storage system A, the Host 1 can detect such failure; e.g., by receiving a failure response from the storage system, by timing out waiting for a response, etc. The clustering software 090103 in the Host 1 can signal the Host 2 of the occurrence.
When the Host 2 detects the occurrence of a failure, it performs a split pair operation (in the case where remote copy technology is being used) between Volume X-P and Volume X-S. When the split pair operation is complete, the Host 2 can mount the Volume X-S and start the applications 090901 to resume operations in Host 2. The split pair operation causes the data replication between Volume X-P and Volume X-S to complete without interruption, Host 1 cannot update Volume X-P during a split pair operation. This ensures that the Volume X-S is a true copy of the Volume X-P when Host 2 takes over for Host 1. The foregoing is referred to as active-sleep failover. Host 2 is not active (sleep, standby mode) from a user application perspective until a failure is detected in Host 1 or in storage system A.
Typical commercial systems include VERITAS Volume Manager™ by VERITAS Software Corporation and Oracle Real Application Clusters (RAC) 10 g by Oracle Corp.
FIG. 10 shows a conventional remote data replication configuration (remote copy). This configuration is similar to the configuration shown in FIG. 9 except that the host 1101 in FIG. 10 is not clustered. Data written by applications 110101 to the Volume X-P is replicated by the disk controller A 110503 in the storage system A 1105. The data is replicated to Volume X-S in storage system B 1107 over an FC network 110305. Although it is not shown, the storage system B can be a remote system accessed over a WAN.
Typical commercial systems include Hitachi TrueCopy™ Remote Replication Software by Hitachi Data Systems and VERITAS Storage Replicator and VERITAS Volume Replicator, both by VERITAS Software Corporation.