In modern computing, computers are frequently arranged into “clusters” of one or more computers, with each computer comprising one or more “nodes”. For example, a computer cluster may comprise one or more individual computers each hosting a single node or a computer cluster may comprise one or more individual computers each running software that enables a single processor in that computer to function as two or more nodes. Different computer programs may run on different nodes within a cluster, and the nodes typically communicate with one another. For example, two or more nodes may share data, such as by accessing a common storage
Clusters of two or more computers can be used to provide redundancy in data processing applications. Where only a single computer is used, if that single computer were to fail for any reason, then the computer programs running on that computer will not be available until the computer is successfully restarted. However, if an instance of a certain computer program were running on particular node within a cluster of two or more computers, and that node were to fail (e.g. because of a computer crash) such that the instance of that computer program that was running on that node was no longer available, another instance of that computer program could be made available, or continue to be made available, on another node. As a result, the data processing functions provided by that computer program would continue to be available to the computer cluster. Typically, critical computer programs would run concurrently, with multiple instances running across multiple nodes, so that a failure will result only in a reduction in capacity, rather than a total loss of the service provided by that computer program.
In practice, there are often dependencies among computer programs that run on nodes within a computer cluster. For example, there are frequently circumstances in which a first computer program requires the services of a second computer program, and will not operate correctly unless the second computer program is already running at the time the first computer program is activated. Such dependencies can be very complex, as a computer program may depend on multiple other computer programs, and some or all of those other computer programs may have their own dependencies, either among each other or with still further computer programs. Fortunately, some solutions for automating the management of these dependencies are available. Such solutions include IBM® Tivoli® System Automation for z/OS®, described in an IBM document having the same title, and IBM Tivoli System Automation for Multiplatforms, also described in an IBM document having the same title. IBM, Tivoli and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. One technique which can be used to facilitate management of computer clusters is described in U.S. Pat. No. 6,789,101.
In addition to the issue of dependencies, the personnel who maintain a computer cluster often need to make changes to the workload profile of that computer cluster (i.e. which computer programs are active on which nodes within the cluster), which, even in the presence of automated dependency management, can be difficult, time-consuming and error prone, especially where the changes are complex or extensive.