The present invention is directed to managing a large distributed computer enterprise environment. More particularly, it relates to retrying a failed operation in a distributed computing environment.
It is well known to couple computer systems together by means of a network such as a local area network (LAN) or wide area network (WAN) to obtain access to computing resources located on a remote computer system. It is generally not economical feasible to provide a printer and expensive DASD at each user workstation. By connecting the resources of the entire network together and making these selectively available to users, a much greater and more efficient collection of resources can be mustered than would be possible if all resources were to be provided at each desktop.
However, managing a computer network comprising hundreds or even thousands of nodes to provide such computing resources can produce serious difficulties for system administrators. Management tasks, such as distribution of system-wide changes, must be carried out quickly and in a dependable manner in order to reduce the probability of catastrophic failure. Typically, a system operation is initiated at a central location, e.g., an administrator's workstation and invoked on one or more remote machines in the network. Preferably, system operations are invoked on a group or subnet of machines in a single operation. Yet distributed computing environments that are known in the art do not scale easily to large size.
There are many reasons why an operation invoked on a remote machine may fail including network failure and incompatible command syntax with the remote machine. One other reason that the operation may fail is that the target machine is down, either because the user has turned the machine off or some program, such as a power management program, has powered down the machine. To complete a software distribution, it is important to be able to do so over the entire network as quickly as possible.
The present invention addresses and solves these problems.