The activities of enterprises are highly intertwined with computers. For many enterprises, computer system unavailability can be disabling. The ability to maintain availability is therefore an important capability of computer systems.
Computer systems used by enterprises store and retrieve large amounts of data. Typically, computer systems rely on data storage systems to perform this function. The data storage system has one or more data storage servers that govern and facilitate access to data storage, processing requests by clients to access the data storage. Data storage servers may also be referred to as data storage instances.
Replication is one technique used to maintain the availability of data storage systems. Replication is the process of replicating data from a “primary” data storage system onto another data storage system, herein referred to as a standby. As changes are made to data on the primary data storage system, the changes are replicated on one or more standby data storage systems. If the primary data storage system becomes unavailable, a standby can be made the primary data storage system.
To help maintain availability and scalability of data storage systems, many customers deploy a data storage cluster. A data storage cluster has a group of one or more data storage servers that provide and manage access to one or more data storages. A data storage cluster is associated with a pool of storage (e.g. a set of disk drives) that is accessible to the data storage servers in the cluster. This might be done over a storage area network (SAN) or over IP as in network-attached storage (NAS). This configuration is useful because the cluster helps to guard against failures of the data storage servers. In certain clusters, all the servers can process requests; this providing increased throughput.
In such a cluster configuration, it is important for the standby data storage system to detect when the primary data storage system has failed so that the standby data storage system can take the appropriate action. The standby data storage system has to rapidly detect data storage server failures on the primary if the standby data storage system is to keep its copy of the data closely in sync with the changes made at the primary data storage system.
Standby data storage systems keep up with changes on the primary data storage system by receiving and applying redo logs (describing the changes) from the primary data storage system and applying them on the copy maintained by the standby data storage system. When the primary data storage system is a cluster, each data storage server maintains and sends its own logs in its own log stream. These log streams are then received by the standby data storage system, merged, and then applied to the standby data storage. The log streams need to be merged because they contain changes to the same set of data blocks and hence must be merged in order for the changes to be applied at the standby in the correct order.
A problem is that if the standby data storage system is unaware of a failure of a data storage server in the primary data storage system, then the standby would still expect to receive and merge log streams from these failed data storage servers. This would cause the standby data storage system to stall and not immediately merge the log streams to apply the logs from the surviving data storage servers. This delay is not acceptable because it prevents the standby data storage system from updating in real-time the changes performed on the primary data storage. Therefore, a need exists for rapidly determining when primary data storage servers are no longer an enabled part of the cluster so that a standby data storage system does not stall waiting for logs from dead data storage servers.
Another problem that can occur is referred to as a split-brain condition in which the data storage servers in a cluster lose connectivity with each other such that the data storage servers reform themselves into two (or more) clusters. Each cluster believes that it has exclusive access to the primary data storage. Since each cluster believes it has exclusive access to the primary data storage, their uncoordinated updates would corrupt the primary data storage. Hardware and software mechanisms may be added to clusters to reduce the possibility of a split-brain condition. However, these mechanisms may fail. Therefore, a need exists to detect as split-brain condition of the primary database system in the event hardware and software mechanisms fail or are not in use.