A cluster is a group of individual servers managed through software as a single system. One example of a cluster is a group of two physical servers coupled to one another through a shared cable for the purpose of internode communications. The management of several servers as a single unit improves the availability, manageability, and scalability of the group of servers. In terms of availability, the use of a cluster improves the availability of services or applications by providing a failure recovery mechanism to insure that applications and services continue to operate despite a hardware, operating system, service, or application failure.
A cluster may be used to serve data storage resources. In one configuration, each server of a two-node cluster is coupled to the storage resources of the cluster system. Included in the storage resources of the cluster system are the data of the applications running on the cluster and the metadata of the cluster. Metadata is data concerning, describing, or defining other data. The combination of the data and the metadata of the cluster defines the state or global state of the cluster. The metadata of the cluster includes information describing the logical relationship of servers in the cluster and their association with the services provided by the cluster. Each node of the cluster uses the saved metadata to manage the operation and resources of the cluster system. If the cluster system does not include a shared external storage resource between the nodes of the cluster and if each server maintains a separate copy of the metadata on a storage resource accessible by the server, the cluster may be susceptible to a partition-in-time discontinuity. A partition-in-time discontinuity is a temporal discontinuity in the metadata that occurs following a sequence of failures in the nodes of the stretch cluster.
Assuming that the cluster system does not include a shared external resource between the servers of the stretch cluster, a temporal discontinuity in the metadata occurs when the servers of the cluster fail in the following sequence. At time T1, server A fails. According to the failover protocol of the cluster system, server B immediately assumes the operations of server A, including the task of serving data from storage and updating the metadata stored in the storage resources available to server B. At time T2, before server A is restored, server B fails. At a later time T3, server A is restored at a time when server B is still down. Server A then assumes the operations of server B without the benefit of the metadata that has been altered between time T1 and time T3. As a result, server A is operating on the basis of metadata that has a time stamp immediately prior to time T1. Therefore, the global state of the cluster system may be inconsistent in that the metadata of the cluster is not accurate. The only operating server of the cluster system is operating on the basis of metadata that is outdated and likely inaccurate.
Some cluster systems attempt to solve this temporal discontinuity by including at least one shared storage resource between the servers of the cluster. In this configuration, the shared storage resource is used by the servers for storage of the metadata of the cluster. If the servers of the cluster fail in the sequence outlined above, the metadata of the cluster will remain current as both servers will have access through the shared storage resource to metadata of the cluster system, eliminating the possibility of a temporal discontinuity. One difficulty of a shared storage approach is that shared storage is not possible in the case of remotely located or stretch clusters. The distance between the servers and the shared storage of the cluster system is limited by the maximum effective distance between nodes in the cluster. As such, a shared storage approach is not possible in the case of geographically separate server nodes.
Another approach to avoiding the possibility of a temporal discontinuity in the metadata of the cluster system is to remove from service the operational server when another server of the cluster fails. According to this approach, after failure of server B and the restoration of server A, server A does not provide any services until server B is also restored, and the metadata of the servers is synchronized. The weakness of this approach is that service to clients of the cluster system is not restored until server B is restored, even though server A is restored and can serve the clients that are dependent on the cluster system. This approach diminishes or prevents the high availability of services for clients of the cluster system, which is one of the primary reasons for clustering.