Information drives business. Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing data to run their businesses. Unplanned events that inhibit the availability of this data can seriously damage business operations. Additionally, any permanent data loss, from natural disaster or any other source, will likely have serious negative consequences for the continued viability of a business. Therefore, when disaster strikes, companies must be prepared to eliminate or minimize data loss, and recover quickly with useable data.
Companies have come to rely upon high-availability clusters to provide the most critical services and to store their most critical data. In general, there are different types of clusters, such as, for example, compute clusters, storage clusters, scalable clusters, and the like. High-availability clusters (also known as HA Clusters or Failover Clusters) are computer clusters that are implemented primarily for the purpose of providing high availability of services which the cluster provides. They operate by having redundant computers or nodes which are then used to provide service when system components fail. Normally, if a server with a particular application crashes, the application will be unavailable until someone fixes the crashed server. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as Failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.
HA clusters are often used for critical databases, file sharing on a network, business applications, and customer services such as electronic commerce websites. HA cluster implementations attempt to build redundancy into a cluster to eliminate single points of failure, including multiple network connections and data storage which is multiply connected via storage area networks or Internet protocol-based storage. Additionally, HA clusters are often augmented by connecting them to multiple redundant HA clusters to provide disaster recovery options.
The high availability and disaster recovery solutions strive to decrease the application downtime and application data loss. In case of a disaster like they flood, earthquake, hurricane, etc., the applications running in the impacted cluster should be failed over to another cluster at the earliest to ensure that the business continuity is maintained. In order to facilitate fast failover of the applications, the cluster failures should be detected in the timely manner. Conventional high availability and disaster recovery solutions rely on inquiry mechanism to determine the health of a particular cluster. This reactive approach has the drawback that it increases the delay in cluster failure detection and it requires lot of message exchanges between the clusters at the time of disaster. Therefore, a proactive approach is needed to decrease the cluster failure detection time.