A server cluster is a group of independent servers connected by a network and managed as a single system. The clustering of the servers provides a number of benefits over independent servers. One such benefit is that cluster software, which is run on the servers in a cluster, may be configured to automatically detect application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications may be restarted on a surviving server without a substantial reduction in service. The server cluster may also be configured such that clients of the server cluster view the cluster as a single physical system, even though the system may include services provided by one or more of several servers. A client, for instance, may create a TCP/IP session with a service in the cluster using a known IP address. This address appears to the cluster software as a resource in the same group (i.e., a collection of resources managed as a single unit) as the application providing the service. In the event of a failure, the cluster “moves” the entire group to another system.
Other benefits include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline. As can be appreciated, server clusters are useful in providing high availability for critical database management, file and intranet data sharing, messaging, general business applications, and the like.
A cluster works with a number of basic system components, known as “resources”, which provide some service to clients in a client/server environment or to other components within the system. Resources may correspond to physical devices, such as disks, to purely software constructs, such as processes, databases, and IP addresses. A resource may be implemented as a resource DLL and hosted by a resource monitor host process running in a cluster node. The resource DLL for a resource is responsible for the control and health monitoring of the underlying component. For instance, a resource DLL for a disk resource contains code which will bring the disk online, offline and monitor its health.
An application is typically represented as a collection of resource groups in the cluster. A resource group in a cluster is a containment unit for resources and is the basic unit of failover. A group could contain one or more directed acyclic graphs of resources where the directed links define dependencies between resources. A dependency between two resources defines an order in which those resources are brought online and offline. For example, a Structured Query Language (SQL) database resource can specify a dependency on a disk resource and a network name resource. The network name will be used by clients to connect to the SQL service. These dependencies allow the cluster runtime to instantiate and shutdown the various resource objects that form an application in a well-defined manner. Thus, in the above example when the group contains the three resources is taken offline, the dependent SQL resource is first taken offline followed by the provider disk resource and provider network name resource, the latter two in no particular order. In addition to defining a start and stop order for resources, the dependency of resources also defines the order in which resources are “terminated”. Terminate refers to a notification delivered to a resource DLL in response to a failure event, when such a notification is delivered the resource that receives that notification typically takes the underlying application offline. Thus, in the above example, if the disk fails, the SQL resource is terminated first followed by the disk resource. The terminate processing may also be identical to the offline order of those resources. In this model, however, the failure of a provider resource always causes the termination of a dependent resource. Therefore, dependent resources are not currently provided with a choice of not getting a terminate notification or of specifying a redundant dependency on multiple provider resources. Continuing with the SQL example, there is no support for the SQL resource to express a dependency on two disk resources (e.g., D1 and D2) such that SQL resource can come online if one or both of the disks are online.
Accordingly, there is a continuing need to improve server cluster dependency between resource objects.