1. Technical Field
The present invention relates in general to method and system for data processing and in particular to method and system for performing an operation on multiple computer systems within a cluster. Still more particularly, the present invention relates to a method and system for automatically performing an operation on multiple computer systems within a cluster through the use of command construct pairs.
2. Description of the Related Art
Data processing systems are frequently utilized to process data and monitor and manage events in contexts in which reliability and guaranteed access to system resources are of paramount importance. For example, data processing systems are utilized to manage critical databases, automate assembly and production lines, and implement complex control systems. Because of the demands of such mission-critical computing environments, fault tolerant data processing systems were developed. Fault tolerant data processing systems rely on specialized hardware to detect a hardware fault and rapidly incorporate a redundant hardware component into the data processing system in place of the failed hardware component. Although fault tolerant data processing systems can transparently provide reliable and nearly instantaneous recovery from hardware failures, a high premium is often paid in both hardware cost and performance because the redundant components do no processing. Furthermore, conventional fault tolerant data processing systems only provide protection from system hardware failures and do not address software failures, a far more common source of system down time.
In response to the need for a data processing system that provides both high availability of system resources and protection from software failures, cluster architecture was developed. A cluster can be defined as multiple loosely coupled server machines that cooperate to provide clients with reliable and highly available access to a set of system services or resources. Cluster resources can include both hardware and software, such as disks, volume groups, file systems, network addresses, and applications. High availability of the cluster resources is ensured by defining takeover relationships that specify which of the server machines (or nodes) in a cluster assumes control over a group of resources after the server machine that originally owned the group of resources relinquishes control due to reconfiguration of the cluster or failure.
During the course of managing a cluster, a cluster administrator frequently needs to perform administrative and other operations on multiple nodes within the cluster. However, conventional shell script languages utilized in cluster management do not provide a convenient mechanism that permits a cluster administrator to specify operations to be performed on multiple nodes within a cluster. Therefore, in order to perform desired operations on multiple nodes within a cluster, cluster administrators are currently required either to perform the operations on each of the cluster nodes individually or to write a looping shell script that performs the operations on a single node during each iteration of a loop. Both of these approaches have significant drawbacks. In particular, performing the desired operations on each node within the cluster individually is inefficient and impractical for clusters with a large number of member nodes. Furthermore, utilizing a looping shell script to perform the operations does not permit concurrent execution of the operations on multiple nodes and does not readily enable the execution of the operations to react to exception conditions or individual node failures.
As should thus be apparent, a need exists for a shell script extension which permits specified operations to be automatically performed on multiple nodes within a cluster. In particular, a need exists for a shell script extension that permits operations to be performed concurrently on multiple nodes and enables execution of the operations to react to exception conditions and node failures.