1. Field of the Invention
This invention relates to the field of distributed computing systems including clustered systems, and more particularly to the dynamic modification of distributed system configurations while retaining system functionality.
2. Description of the Related Art
As computer networks are increasingly used to link computer systems together, distributed operating systems have been developed to control interactions between computer systems across a computer network. Some distributed operating systems allow client computer systems to access resources on server computer systems. For example, a client computer system may be able to access information contained in a database on a server computer system. When the server fails, it is desirable for the distributed operating system to automatically recover from this failure. Distributed computer systems with distributed operating systems possessing an ability to recover from such server failures are referred to as xe2x80x9chighly availablexe2x80x9d systems. High availability is provided by a number of commercially available products including Sun(trademark) Clusters versions 2.X and 3.X from Sun(trademark) Microsystems, Palo Alto, Calif.
Distributed computing systems, such as clusters, may include two or more nodes, which may be employed to perform a computing task. Generally speaking, a node is a group of circuitry designed to perform one or more computing tasks. A node may include one or more processors, a memory and interface circuitry. A cluster may be defined as a group of two or more nodes that have the capability of exchanging data between nodes. A particular computing task may be performed upon one node, while other nodes perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among the nodes to decrease the time required to perform the computing task as a whole. A processor is a device configured to perform an operation upon one or more operands to produce a result. The operations may be performed in response to instructions executed by the processor.
Clustering software is often implemented atop an operating system, for instance Solaris(trademark), again from Sun(trademark) Microsystems. Such clustering software enables two or more nodes within a cluster. In more recent versions of clustering software, if one node is reset, shuts down, or loses conductivity with the other nodes, applications running on the node that has shut down automatically transfer operation to other nodes in the cluster.
Some operating system/clustering software implementations further enable one or more nodes within the cluster to be further partitionable into domains. A domain may be said to be defined as a machine running a single instance or copy of an operating system. Domain partitioning is enabled by Sun(trademark) Cluster implemented on the Solaris(trademark) operating system. While this partitioning into domains provides features and benefits beyond the scope of the present invention, the terms xe2x80x9cnodexe2x80x9d and xe2x80x9cdomainxe2x80x9d may be used synonymously herein.
Nodes within a cluster may have one or more storage devices coupled to the nodes. Generally speaking, a storage device is a persistent device capable of storing large amounts of data. For example, a storage device may be a magnetic storage device such as a disk device, or optical storage device such as a compact disc device. Although a disk drive is only one example of a storage device, the term xe2x80x9cdiskxe2x80x9d may be used interchangeably with xe2x80x9cstorage devicexe2x80x9d throughout this specification. Nodes physically connected to a storage device may access the storage device directly. A storage device may be physically connected to one or more nodes of a cluster, but the storage device need not necessarily be physically connected to all the nodes of a cluster. The nodes that are not physically connected to a storage device may not access that storage device directly. In some clusters, a node not physically connected to a storage device may indirectly access the storage device via a data communication link connecting the nodes. Accordingly, a node may have access to one or more local and/or global, and/or shared storage devices.
From the foregoing it will be appreciated that the storage options capable of implementation in the various clustering methodologies currently available are highly variable. There are, however, a few guidelines that can generally be said to be applicable to most storage options implemented in clustering solutions. A first general guideline is that the storage option implemented within the cluster should enable a minimum of two pads per disk to insure data redundancy. A second guideline is that the clustering methodology should enable the access, by each node in the cluster, to global storage devices implemented throughout the cluster.
Disk access may be had either through direct access from the node to its respective disk, or through a global storage system. Global storage may be defined to be a disk or device which is connected to some or all of the nodes of a cluster, but which is accessible by all the nodes or domains in the cluster. Examples of file systems include the Unix(trademark) file system or UFS; the Veritas(trademark) file system or VFS, and natural file systems or NFS. One or more of these file systems may be used to implement a global storage methodology.
One of the aims of a highly available (HA) system is to minimize the impact of casualties to any of the individual components of the system. One such casualty would be the failure of the node or disk on the server side. In high availability systems it is advantageous where a first server fails, a second server seamlessly continues operation for all the clients in the cluster. Accordingly, failure of the first server is invisible to the client. Indeed, the client is only aware of a disk failure where all physical access to the disk is lost. Accordingly, one of the goals of high availability systems may be said to retain overall system functionality in the face of changes to the distributed system configuration.
The various nodes within a cluster are typically connected by means of IP addresses, and one cluster can host substantially any number of IP addresses. Sun(trademark) Cluster 2.2 enables failover IP addressing. In this scheme, each IP address is hosted on one single node, and each node has one adapter. In the event that one adapter or node fails, or is reset to the same IP address, the system reestablishes itself on a different server. In this implementation a logical host may be said to comprise an IP address and a disk system, which are inseparable.
Previous clustering products generally require that changes in cluster configuration mandate the system to be taken down, the new cluster configuration made current, and then the system re-started. A truly dynamic cluster configuration management system would enable changes in cluster configuration while retaining system functionality. Indeed, it would be particularly advantageous for a clustering system to enable cluster re-configuration during system operation, which reconfiguration was totally transparent to system users.
In addition to failover IP addressing, Sun(trademark) Cluster 3.0 implements scalable IP addressing. Scalable IP addressing enables servers running on each domain to bind to the same IP address and port. This enables multiple instances of the server process to be started and bound or listening to the same IP address and port. The impact of scalable IP addresses is that when a request is received, there are multiple locales to which the request can be sent.
From the foregoing it becomes evident that managing the many interconnections, features, designs, and topologies of clustering software is a non-trivial exercise. Moreover, managing the several views of a cluster, or distributed, system in such a manner as to minimize the requirement for active user input is an even more difficult challenge. What is necessary is a methodology, which enables the dynamic modification of cluster configurations, while retaining the system""s ability to function. What is even more desirable is a methodology that enables the modification of the cluster configuration from any node in the cluster dynamically, whereby the system continues with the computing tasks being performed on the system, despite change in system configuration.
In order to implement such a dynamic cluster modification, it would be needful to store cluster configuration data in such a manner that it is accessible from all nodes in the cluster, for instance as a file. This cluster configuration information includes, but is specifically not limited to: the number of domains or nodes within the cluster or distributed system; individual domain details; information on the several adapters or network interface cards of the system including the number of adapters per node; cluster topology information; black box switch information; cabling information; and information on quorum devices.
When a reconfiguration command is given, a truly utile system would then change the configuration file in parallel on all nodes in the cluster. Accordingly, all nodes in the cluster could then receive notification of the changed cluster configuration in parallel, and the cluster could be configured dynamically as specified by the command.
The present invention enables the dynamic modification of cluster configurations while retaining overall functionality of the system. In other words, a change to the system configuration may be made without disruption of the computing task or tasks being performed by the nodes of the cluster. To enable this dynamic modification, cluster configuration data is stored as a table in a cluster configuration repository that is accessible from all nodes in the cluster. Accordingly, the present invention enables the modification of the cluster configuration from any node in the cluster dynamically. When the reconfiguration command is given, the configuration table is changed and all the nodes in the cluster are notified of the changed configuration in parallel. Following the notification by the nodes of the changed cluster configuration, the changes to the cluster are implemented dynamically as specified by the command.
These and other advantages of the present invention will become apparent upon reading the following detailed descriptions and studying the various figures of the Drawing.