1. Field of the Invention
The present invention relates to computer science. More particularly, the present invention relates to fault tolerant client-server environments.
2. Background
Many organizations have a substantial number of computers in operation, often located far apart. For example, a company with many factories may have a computer at each location to keep track of inventories, monitor productivity and do the local payroll. Connecting these computers via a network enables resource sharing by making all programs, equipment and especially data available to anyone on the network without regard to the physical location of the resource and the user.
Reducing the cost of computing is important. Small computers have a much better price/performance ratio than large ones. Mainframes are much faster than personal computers, but they cost significantly more. This imbalance has caused many systems designers to build systems consisting of personal computers, one per user, with data kept on one or more shared file server machines. In this case, the users are the clients, and this type of arrangement is referred to as a client-server architecture.
Turning now to FIG. 1, a block diagram that illustrates a typical client-server architecture is presented. Client 10 is connected to a server 12 via bus 14. Communication typically takes the form of a request message 16 from the client 10 to the server 12 asking for some work to be done. The server 12 then does the work and sends back a reply message 18. Typically, there are relatively many clients using a relatively small number of servers.
Reliability and availability are important features in client-server computing environments. Computer networks increase reliability by having alternate sources of supply. For example, all files may be replicated on multiple machines, so if one of them is unavailable (due to hardware failure or communication failure), the other copies may be used. In addition, the presence of multiple processors means that if the performance of a particular processor degrades sufficiently, the other processors may be able to take over at least a portion of its work.
Reliability and availability are especially important for applications that perform critical transactions. Such applications include military, banking, air traffic control, nuclear reactor safety and many other applications. In these cases, the ability to continue operating in the face of hardware or communication problems is of utmost importance. Servers in these systems typically must be fault tolerant. For instance, if the primary server is functioning poorly or not at all due to a heavy workload or network problems, a backup or secondary server may be invoked to assume the server workload, thus allowing critical transactions to continue without undesirable interruption.
Typically, the client in a fault tolerant system detects an improperly functioning server by monitoring communications between the client and the server. One typical fault tolerant algorithm requires that the client record each request it sends to the server. The client stores a specific number of recent requests into a buffer and relates any reply received to its respective request. This method requires a mechanism to uniquely identify each request and each reply. Typically, a separate task is activated periodically to check the delays and reply-request ratio, which is the number of replies received from the server divided by the number of requests sent to the server. If replies are received with large delays, or if the reply-request ratio is too small, deteriorating server performance is indicated.
This method of logging messages and associating each reply with a specific request increases the complexity and memory requirements of fault tolerant systems. This problem is exacerbated in modern client-server systems in which a single client is connected to many servers, requiring separate fault tolerant checks for each client-server connection.
Accordingly, a need exists in the prior art for a method and apparatus for a robust fault tolerant client-server system that requires relatively little processor and memory overhead.
A method for determining the performance of a first processor in a computer network in which the first processor is connected to a second processor includes incrementing a request count when the second processor requests data from the first processor, incrementing a reply count when the second processor receives data from the first processor, dividing the reply count by the request count to create a ratio and indicating the performance of the first processor is less than expected when the ratio is less than a threshold. An apparatus for determining the performance of a first processor includes at least one memory having program instructions and at least one processor coupled to the first processor. The at least one processor is configured to increment a request count when the at least one processor requests data from the first processor, determine the performance of the first processor based upon a reply count and the request count and increment the reply count when the second processor receives data from the first processor.