The invention relates generally to computer network servers, and more particularly to computer servers arranged in a server cluster.
A server cluster ordinarily is a group of at least two independent servers connected by a network and utilized as a single system. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server.
Other benefits of clusters include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline for the duration of the maintenance activity. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications and the like.
Because clusters often deal with critical applications, clusters need to be highly reliable and available. As a result, the data that is needed to operate a cluster, referred to herein as the cluster operational data, also must be highly available, persistent, and consistent. Such cluster operational data includes information about the servers in the cluster (cluster membership), network topology information, the applications installed on the cluster and how to handle failures (failover policies).
Prior clustering technology required that each node (system) of a cluster possess its own replica of the cluster operational data, and that a majority of possible nodes of a cluster be functional in order to have a valid cluster. This ensured that at least one node in any given set of nodes in a cluster was common to any previous cluster, and thus overcomes partitioning problems (wherein nodes of a cluster become separated or a new cluster is formed later in time). More particularly, requiring a majority of nodes be operational to have a cluster ensures that only one cluster can operate at a time and any new cluster that is formed has at least one node in common with the previous cluster and thus a copy of the correct cluster operational data.
A recent improvement described in U.S. patent application Ser. No. 08/963,050, U.S. Pat. No. 6,279,032, assigned to the assignee of the present invention and hereby incorporated by reference in its entirety, provides the cluster operational data on a single quorum device (e.g., a disk) for which cluster nodes arbitrate for ownership. Because the correct data needed to operate a cluster is on the single quorum device, partitioning problems are solved, as a cluster may be formed as long as a node of that cluster has ownership of the quorum device. This further increases cluster availability, since at a minimum, only one node and the quorum device are needed to have a working cluster. While this is a significant improvement over requiring a majority of nodes to have a cluster, a single quorum device is inherently not reliable, and thus to increase cluster availability, expensive hardware-based solutions are presently employed to provide highly-reliable single quorum device for storage of the operational data. The cost of the highly-reliable storage device is a major portion of the cluster expense.
Yet another significant improvement directed to quorum storage is described in the copending U.S. patent application Ser. No. 09/277,450, entitled xe2x80x9cMethod and System for Consistent Cluster Operational Data in a Server Cluster Using a Quorum of Replicas,xe2x80x9d assigned to the assignee of the present invention, filed concurrently herewith, and hereby incorporated by reference in its entirety. In the solution described therein, the cluster operational data is replicated to a plurality (replica set) of storage devices that are independent from any given node of a cluster. To form and operate as a cluster, a node arbitrates for and gains exclusive possession over a quorum (majority) of the replica set. This solves partitioning problems by ensuring that only one cluster can operate at a time, and that at least one replica storage device in any given set of nodes in a cluster was common to any previous cluster, whereby that the new cluster has at least one copy of the correct cluster operational data. Use of the replica set method and system to store the cluster operational data is generally preferable over the other methods/systems because it requires only a small number of relatively inexpensive components to form a cluster, thereby increasing availability relative to the quorum of nodes solution, while lowering cost relative to the single quorum device solution.
Regardless of the type of storage used for the cluster operational data, the performance and size of a cluster is limited by the rate at which the operational data can be updated, in part because such updates are relatively slow, comprising careful transactional logging of changes. As a result, the existing solutions for storing the cluster operational data do not scale well to large clusters, since the larger the cluster, the more updates need to be made to that data. Moreover, the cost of updating the operational data is higher when it is replicated to multiple devices, as in the preferred replica set solution. In sum, there are tradeoffs and limitations resulting from having to store the cluster operational data in a highly-reliable manner with complete integrity as required in typical clustering applications, while also providing performance and scalability.
Briefly, the present invention provides a method and system for distributing various types of cluster operational data among various storage devices of the cluster, permitting. the use of relatively higher performance and/or less-costly techniques for storing some of the cluster operational data, thereby facilitating larger clusters. To this end, one type of cluster operational data, referred to herein as the cluster""s core boot data, which includes information needed to get the cluster up and running, (e.g., cluster membership information and network topology), is maintained on some type of quorum storage mechanism. The quorum storage may be implemented via a quorum of nodes, a single highly-reliable quorum disk or a quorum of replica (disk) members. The other cluster operational data, referred to herein as the cluster configuration data, which may include information about the applications installed on the cluster and failover policies, is maintained on one or more separate storage elements, generally on cheaper and/or higher performance storage elements such as a two-element mirror set. The cluster configuration data comprises the majority of the cluster operational data, and is updated relatively frequently when compared to updates to the core boot data. As a result, storing the cluster configuration data on the higher performance and/or less-costly storage provides significant benefits.
In addition, the state of those separate storage elements may be maintained with the core boot data on the quorum storage, providing high-reliability while ensuring the integrity of the cluster configuration data. On the quorum storage, the state information is available to any cluster node and is not subject to partitioning problems. State information for other cluster data such as application data and/or the state of its storage mechanisms may be similarly maintained, enabling the use of mirror sets or the like for this other data whereby the other data is consistent, reliable and available.
Significant flexibility in how a cluster may be configured is achieved, along with improved performance and better scalability. For example, when mirror sets are used for the cluster configuration data, the mirror sets provide simpler, higher-performance, lower-cost storage devices with high reliability. Mirror sets thus may be used for the cluster configuration data, while a quorum solution is used to provide the storage for the core boot data, and the status information of the mirror set.