A cluster is a group of resources consisting of individual servers, storage and network providing highly-available and scalable computing services to clients, managed through software as a single system. One example of a cluster server is a group of two physical servers coupled to one another through a shared cable for the purpose of internode communications. The management of several servers as a single unit improves the availability, manageability, and scalability of the group of servers. In terms of availability, implementing servers as a cluster improves the availability of services or applications by providing a failure mechanism to insure that applications and services continue to operate despite a hardware, operating system, service, or application failure.
Many clustering solutions rely on a “shared” storage model for storing the data and meta-data for the server cluster. The shared storage cluster approach requires that the cluster servers and their storage be co-located. There is a single copy of data and meta-data in a centralized location, accessible to all member servers. The shared storage cluster is susceptible to failures resulting from natural disasters, power outages, and similar events that might affect a single geographic site. In contrast to the shared storage mode, the “stretched” cluster model allows for geographically separated member nodes of a cluster to exist. In a stretch cluster, there are multiple copies of the data and meta-data, one for each site. Accordingly, each server has its own replicating or mirroring storage system. Because the cluster servers may be dispersed to geographically distant locations, the stretched cluster model provides for a disaster tolerant cluster configuration.
The two most common methods for replicating or mirroring data between the nodes of the stretched server cluster are synchronous data replication and asynchronous data replication. In synchronous data replication, when an application performs a write to the storage at its local site, the operation is affected to the copies of the data in all the sites at the same time, or not at all. Therefore, the data remains consistent from one write operation to the next, across the cluster. Generally, synchronous data replication introduces a significant performance overhead, but maintains data integrity. In asynchronous data replication, when an application performs a write to the storage at is local site, that operation is written in the local site first, and eventually affected to the copies of the data in the other sites. Therefore, while data might be inconsistent from one write operation to the next, the local site will have the most up-to-date copy of the data at all times. Asynchronous data replication has better performance characteristics than synchronous data replication, but exposes the cluster to the possibility of data loss.