The present invention relates to generally computer systems, and particularly to computer systems operating in a cluster environment.
Over the last decade, companies within virtually every industry have relied ever more heavily on computerized systems to enable those companies to maintain competitive positions within their respective industries. At the same time, businesses have attempted to move from large mainframes toward more distributed computing systems, often in an effort to minimize the costs associated with their information infrastructures. At the same time, however, companies continue to demand increased reliability, scalability and responsiveness from their computer systems.
One strategy frequently employed to harmonize the move to smaller systems with the continued need for the scalability, reliability and responsiveness traditionally associated with large systems is to “cluster” several Intel™ and/or UNIX™ based servers together. Those skilled in the art will appreciate that cluster computing requires relatively low initial capital investment (especially when compared with a mainframe-based solution) but still provides the desired reliability and responsiveness. Moreover, cluster solutions allow for a high degree of scalability, as servers may be upgraded and/or added to the cluster as needs grow. Server clusters have proved particularly advantageous in transaction-intensive applications, such as web servers, database servers, application servers and the like.
In the past, however, server clusters have been relatively difficult to administer. In a typical cluster, each server will have its own operating system, as well as “clusterware” that allows the servers to interoperate as a cluster, and a cluster-enabled application, such as a database management system, web server application and/or the like. Merely by way of example, a typical cluster node may be running a variant of UNIX or Linux, such as Sun Corporation's Solaris™ operating system or the free RedHat™ Linux operating system, clusterware such as Oracle Corporation's Cluster Ready Services™, and a database application, such as Oracle's 10g™ relational database management system (“RDBMS”).
When adding, deleting or reconfiguring a node in the cluster, the node to be modified (as well, in many cases, as the other nodes in the cluster) generally will have to be configured. This configuration may include changes to the operating system, clusterware and/or applications. Such configuration often must be performed manually on each node, resulting in increased labor costs and frustration for administrators. In addition, manual configuration presents many opportunities for misconfiguration, such as configuring parameters on one node that render that node incompatible with other nodes in the cluster, that may not be readily ascertainable from the behavior of the cluster. Such opportunities for misconfiguration merely increase the cost and frustration already inherent in the management of clusters.
Further, in many cases, modification of a cluster (or a node therein) is performed in response to some system failure. For example, if a node fails, the cluster may need to be reconfigured to allow the removal of that node, to reconfigure the failed node and/or to add a replacement node. Often, therefore, the entire cluster may be unavailable, or at the very least severely impaired, until the reconfiguration is complete.
Consequently, there is a need for solutions that can ease the administrative burdens of clustered computing.