Increasingly, businesses expect products and services to be available 24 hours a day, 365 days a year. In an effort to insure maximum availability, cluster technology can reduce or minimize the down times of databases and other applications. In a common distributed system, the applications run on Microsoft clusters and are configured with Microsoft Cluster Server (MSCS) software.
A cluster is a configuration of two or more independent computing systems, called cluster nodes, that are connected to the same disk subsystem. The cluster nodes are joined together through a shared storage interconnect, as well as an internode network connection. In a cluster, each server (node) typically has exclusive access to a subset of the cluster disk during normal operations. As a distributed system, a cluster can be far more effective than independent stand-alone systems, because each node can perform useful work yet still be able to take over the workload and disk resources of a failed cluster node. When a cluster node fails, the cluster software moves its workload to the surviving node based on parameters that are configured by a users. This operation is called a failover.
The internode network connection, sometimes referred to as a heartbeat connection, allows one node to detect the availability or unavailability of another node. If one node fails, the cluster software fails over the workload of the unavailable node to the available node, and remounts any cluster disks that were owned by the failed node. Clients continue to access cluster resources without any changes.
In a cluster environment, the user typically interacts with a specific node, while user processes may be running on another node. Complicated techniques have been used to relay error information back to the user. That information, however, tends to be minimal.