1. Technical Field
This invention relates to a distributed multi-processing computer system in communication with a plurality of storage devices. More specifically, the invention relates to employment of a Hyperswap operation, and addressing failure of one or more nodes during the Hyperswap operation.
2. Background of the Related Art
In computer science, a cluster is a group of computers, also known as nodes, that work together to form a single computer system. There are different categories of clusters, including, but not limited to, a high availability cluster and grid cluster. The high availability cluster is employed to improve the availability of the services of the cluster wherein each member of the cluster is in communication with a plurality of storage devices. In a high availability cluster, it is desirable to provide application resilience across storage failure when, from time to time, storage devices may fail. Failure of a storage device may cause disruption to the system if critical data is maintained on the failed or failing storage device, since, even if an up to date replica of the data is maintained in another storage system using synchronous storage replication, applications have to be stopped and restarted before they can use the replica data, and such application outage can be unacceptable in some enterprise environments.
Hyperswap is a continuous availability solution wherein a set of nodes accessing a synchronously replicated storage system, containing a group of storage volumes, switch from a primary storage system to a secondary (replica) storage system, and must do so without any application outage in any node in the cluster. The Hyperswap operation may take place because of a storage system failure, known as an unplanned Hyperswap, or under administrative control, known as a planned Hyperswap. Furthermore, the Hyperswap operation may involve both boot volumes and non-boot volumes in the storage system.
FIG. 1 is a flow chart (100) illustrating an example of a prior art Hyperswap operation in a clustered environment that can lead to a system error. The example cluster consists of at least two nodes in communication with a storage subsystem having one or more storage volumes on a primary storage system, and corresponding replica volumes on a secondary system (102). Each node in the cluster boots from a (boot) volume in the primary storage system with a replica in the secondary storage system (104). A third node in the cluster is in a temporary off-line state (106). A Hyperswap operation is invoked (108), wherein the boot volumes residing in the primary storage system are no longer valid for access and instead, the replicas of those volumes in the secondary storage system are the preferred volumes. This occurs while the third node is off-line, as a result of which, the third node is not aware of the Hyperswap operation. At some point in time after the Hyperswap operation has completed, the third node comes on-line and tries to boot from the boot volume local to the primary storage system (110). Even if that boot volume on the primary storage system is accessible after the Hyperswap operation, this boot volume is not valid for access. More specifically, all updates to boot images on boot volumes for nodes in the cluster, made by an administrator, will be made on the secondary storage system. Since there is no central shared memory (in general purpose computing systems without specialized architectures) in the cluster to reference the location of the boot volume, the third node will boot from the wrong boot volume in the primary storage system (110). Accordingly, there is no element in the cluster computing environment to communicate boot volume relocation to a node that was either off-line or in the process of coming on-line during the Hyperswap operation.
One solution to the problem presented in FIG. 1 is to provide a centralized shared memory facility that always contains the correct boot volume that is to be used by each node on reboot. However, such centralized memory is not available for leveraging in a clustered environment. Accordingly, there is a need for a solution that supports the Hyperswap operation in a clustered environment that communicates the correct boot volume to any node that was off-line or in the process of coming on-line, i.e. rejoining the cluster, during a Hyperswap operation, to ensure that the affected node(s) boot from the correct boot volume as they come on-line and join the cluster.