A computer network typically includes a set of devices connected in a way that allows the devices to communicate with each other. Such devices, which can include workstations with memory and one or more processors, are often referred to as nodes. A cluster is a group of nodes that work together as a single system. One software application that allows groups of nodes to operate as a single system is NT Enterprise, which is generally available from Microsoft Corporation.
Clusters can be either "shared data" or "shared nothing" clusters. In a shared data cluster, all nodes have access to one or more shared storage devices. In a shared nothing cluster, storage devices are "owned" by nodes, and nodes only have access to the storage devices that they own.
In general, clustering technology is designed to minimize downtime for client/server network computing applications. Downtime may be minimized, for example, by shifting the responsibilities of a first node in the cluster to a second node in the cluster if the first node in the cluster fails. Shifting responsibilities in this manner is referred to as fail over. A node that assumes the responsibilities of another node in response to a fail over is referred to herein as a fail over node.
The responsibilities that a node is able to handle is determined in part by the software that is executing on the node. For example, a node may be able to process database requests because it is executing a database server. If the node fails, the responsibility for processing database requests can only be shifted to a fail over node that is able to execute the database server. Since the fail over node is not currently executing the database server, the database server must be started on the fail over node in response to the fail over. Techniques for performing automatic fail over in a client/server system are described in U.S. patent application Ser. No. 08/866,842 entitled "Automatic Failover for Clients Accessing a Resource Through a Server", filed on May 30, 1997, the contents of which are incorporated herein by reference.
Many software programs must be specifically configured for a node before they can be safely executed on the node. Configuring a software program may involve, for example, (1) configuring the network required to run the client/server based application, (2) configuring the application itself, and (3) configuring any other software that may be required for the application to run. The process of configuring a software program for a node can be complex and time consuming. It typically requires the user to manually perform a series of steps specified by the software provider. For sophisticated software programs, the steps can be both numerous and complex. Further, if one step in the configuration process fails, the entire configuration operation may have to be restarted.
Applications designed to run on a single node are generally referred to as stand alone applications. An application that runs in a cluster environment and is capable of fail over to another node in the cluster when the primary node fails is referred to as a fail safe application.
Before a stand alone application is configured for fail safe operation, the application can only run on one of the clustered nodes. This node is referred to as the owner node. Fail safe operation requires the application to be configured both on the owner node and on other nodes in the cluster so that the application can run on multiple nodes in the cluster to provide fail over capability.
In fail over systems, software programs must be configured on both (1) nodes that will initially execute the programs, and (2) nodes that may have to execute the programs if fail over occurs. Thus, depending on the fail over policies employed within a cluster, a given software program may have to be configured on all of the nodes in a cluster even though it is planned to be executed on only one of the nodes in the cluster at a time.
A configuration operation becomes exponentially more complex and time consuming the more nodes for which the program must be configured. Consequently, configuring applications for use on clusters that employ fail over can be prohibitively burdensome. For example, one software program has a forty-step configuration process. Configuring such a program on a relatively small cluster of nodes has taken an expert engineer approximately nineteen hours.
Based on the foregoing, it is clearly desirable to reduce the complexity of configuring software in clusters that employ fail over policies.