1. Field of the Invention
The present invention relates to computer systems and networks of computer systems, and, more particularly, to efficient, parallel execution of computer processes over multiple computer systems.
2. Description of the Related Art
Computer systems have attained widespread use in providing computing power to businesses and institutions. Some important fields in which computer systems have become common include manufacturing monitoring and control; electrical, mechanical, and chemical system design and simulation; and network management and service. Productive processing time is of great importance to the users of computer systems in these and other fields because computer system down time can result in significant costs.
Computer systems are designed to provide a degree of system availability that is necessitated by the applications for which the computer system is intended. High availability (HA) computer systems, as their name implies, are required to be available, or "on," virtally all of the time. Such computer systems (or networks of such computer systems) should be substantially error free or should be capable of recovery from errors. Also, consumption of resources and down time in such systems due to maintenance tasks should be minimized.
For example, in a 24.times.7 manufacturing system, time is extremely critical. Performance of management/maintenance operations as quickly and efficiently as possible is extremely important. In such a manufacturing system, downtime for such operations can be very limited. For example, in one such manufacturing system, scheduled downtime may be as little as one hour per quarter. Therefore, the operations performed during these downtime periods must be optimized to run as quickly and efficiently as possible.
In an environment where scheduled downtime is extremely limited, it is imperative that operations be performed as quickly and accurately as possible. Also, in a client/server environment where hundreds of systems may be involved, the ability to execute the same command across all systems with minimal effort and in the shortest time possible is extremely important. When a common operation is to be performed across a group of systems, one of two methods is typically used. For example, management operations to be performed across many computer systems in a client/server environment are typically performed either manually on each system, one-by-one, or sequentially using a script. Manual execution on each system involves logging into each system and performing the command manually. This is time consuming and can result in errors and/or omissions. Sequential script execution typically includes bundling the work to be done into a script and then sequentially executing the scripted commands on all systems. If hundreds of systems are involved, this sequential method can also be very time consuming.