Administration of large, multi-server, computing environments is a field of growing interest as the number and size of large, multi-server computing environments grows. The field of multi-server system administration and management focuses on maintaining the physical operation of a multitude of computer systems, often referred to as nodes, connected in a network. These management tasks include a number of functions, including adding, modifying and removing nodes, users, tools, and roles; defining groups of nodes; authorizing users to perform operations on nodes; installing, maintaining and configuring hardware; installing and upgrading operating system and application software; and applying software patches, among other functions.
Several powerful software applications that assist and centralize the management of large, multi-server, computing environments have been developed in the field. Generally these applications have included a single, large multi-server management application running on a single centrally located management server operated by one or more system administrators, and, in only a few implementations, separate management agent applications running on each of the nodes in the multi-server computing environment.
In such a configuration, the large, central multi-server management application running on a centrally located management server is generally responsible for communicating with the separate management agent applications running on each of the nodes in order to determine the status of any management tasks being performed on each of the nodes. The central multi-server management application is thus required to constantly query the separate management agent applications on each of the nodes. This results in growing demand on network bandwidth as the central multi-server management application must query more and more nodes.
Another result of this arrangement is increasing wait times as the central multi-server management application must wait for responses from each of the nodes before proceeding with other tasks. In addition, the failure of any management agent, or a sudden failure of a node on which a management agent is performing a task, may cause the central multi-server management application to become caught in an indefinite loop waiting for a response from an inactive agent. Furthermore, the central multi-server management application may also be interrupted by the routine removal of a node from service in order to perform a hardware or operating system software upgrade and may not be made aware of the occurrence or nature of the upgrade upon the return of the node to service.