As computer and computer systems have evolved over the years, the processes they implement have evolved in their complexity. One approach to implementing computer processes to solve more complex problems is to assign a number of computers to handle different parts of a process. Each part or task may be handled by different computers, computer objects, applications, or servers, hereafter referred to collectively as servers. These servers make up a distributed network. Within the network, different servers may handle functions such as management, data base maintenance, accessibility, server boot-up, shut-down, and so forth.
Servers within a distributed network perform transactions with other servers and use resources within the system. As the servers require the use of other servers and resources, the operability and reliability of the servers become more important. If a server fails while performing a task, it may affect other servers and resources that were tied up in transactions with the server at the time of its failure. Whether a server has failed completely or the server's condition has degraded is important information to a network. Thus, it is important to know the status of a server in order to maintain the health of the server and the network in which it operates. A maintenance system should be able to require a server to provide health information and be able to maintain or correct servers not operating properly.
What is needed is a system for monitoring and inquiring into the health of a server and for taking corrective action if deemed appropriate.