1. Technical Field
The present invention relates in general to data processing, and in particular, to cluster data processing systems.
2. Description of the Related Art
A cluster system, also referred to as a cluster multiprocessor system (CMP) or simply as a “cluster,” is a set of networked data processing systems (or “nodes”) with hardware and software shared among those data processing systems, typically (but not necessarily) configured to provide highly available and highly scalable application services. Cluster systems are frequently implemented to achieve high availability as an alternative to fault tolerance for mission-critical applications such as data centers, aircraft control, and the like. Fault tolerant data processing systems rely on specialized hardware to detect hardware faults and to switch to a redundant hardware component, regardless of whether the component is a processor, memory board, hard disk drive, adapter, power supply, etc. While providing seamless cutover and uninterrupted performance, fault tolerant systems are expensive due to the requirement of redundant hardware, and fail to address software errors, a more common source of data processing system failure.
High availability can be achieved in a cluster implemented with standard hardware through the use of software that permits resources to be shared system wide. When a node, component, or application fails, the software quickly establishes an alternative path to the desired resource. The brief interruption required to reestablish availability of the desired resource is acceptable in many situations. The hardware costs are significantly less than fault tolerant systems, and backup facilities may be utilized during normal operation.
Cluster system management is a special class of the general system management, with additional resource dependency and management policy constraints. In particular, the maintenance of cluster configuration information required for cluster system management poses a special problem. The cluster configuration information required for system management is typically stored in a database, which is either centralized or replicated to more than one data processing system for high availability. If centralized, the data processing system which manages a centralized cluster configuration database becomes a potential bottleneck and a single point of failure.
To avoid the problems of a centralized cluster configuration database, the cluster configuration database may be replicated and maintained on a number of data processing systems within the cluster. In a small cluster, the system configuration and status information may be readily replicated to all data processing systems in the cluster for use by each data processing system in performing system management functions such as failure recovery and load balancing. Full replication provides a highly available cluster configuration database and performs adequately as long as the cluster size remains small. In a very large cluster, however, the overhead associated with full replication of the cluster configuration database can be prohibitively high.
Another central issue in cluster system management is the handling of cluster partitions. Cluster partitions occur if nodes that can nominally be configured to operate in a cluster are partitioned into two or more sets of nodes that are not currently configured to share system resources. When a cluster partition occurs, for example, at system startup or in response to return of one or more downed nodes, errors can result if multiple copies of the same application, especially a database application such as the cluster configuration database, are run from these (temporarily) independent nodes of the cluster. A conventional way of managing cluster partitions is to require that a cluster remain offline until it reaches quorum. While the definition of quorum varies between implementations, in many implementations a majority quorum is employed, and a cluster is said to have reached quorum when the number of active nodes is at least N/2+1.
As nodes from a cluster partition become members of a cluster, the nodes must be assigned an identifier so that the nodes' software and hardware resources can be made available for access to the cluster. In a conventional cluster implementation, the identifiers are assigned by a central naming authority so that the identifiers can be guaranteed to be universally unique in the cluster. However, the use of a central naming authority can undesirably lead to a single point of failure, as well as the need for a node to modify its preexisting identifier upon joining the cluster.