This invention relates to data processing systems of the type which include a cluster of at least two computers that execute application programs in a "failover" mode of operation; and more particularly, this invention relates to methods for estimating the "availability" of the application programs in the above type of data processing systems.
To explain the failover mode of operation as that term is used herein, consider the case where the cluster includes only two computers. Initially, the cluster operates in a first state wherein both of the computers are available to run the application programs. But in that first state, only one of the computers (computer #1) is servicing requests to use the application programs. The cluster remains in the first state until a stoppage occurs in computer #1.
Then, a transition is made to a second state wherein the other computer (computer #2) assumes responsibility for handling all requests to use the application programs but does not yet run those programs. This second state lasts only temporarily, and it is herein called the failover state. Then a transition is made to a third state.
In the third state, computer #2 services requests to use the application programs; and at the same time, repair work is performed on computer #1 to try to fix the cause of the stoppage. If computer #1 is made operable before computer #2 stops, then a transition is made back to the first state. Otherwise, if computer #1 is not made operable before computer #2 stops, then a transition is made to a fourth state wherein no requests to use the application programs are serviced.
The cluster remains in the fourth state until one of the computers is made operable. When that occurs, a transition is made back to the third state. There, the one computer which is operable services all requests to use the application programs; and at the same time, repair work is performed on the stopped computer.
By the availability of an application program is herein meant the probability that at any particular time instant, at least one of the computers will actually be servicing the requests to use the application programs. In the above described cluster of two computers, the application programs are not available for use in both the second state and the fourth state.
In the prior art, methods which are somewhat related to the availability of an application program in a cluster of computers is described in a book which is entitled "Reliable Computer Systems" (second edition) by Daniel P. Siewiorek and Robert S. Swarz, copyrighted 1992 by Digital Equipment Corporation and published in Digital Press (hereinafter Siewiorek). There, in FIG. 5-19c on page 314, a three state Markoff model is shown to describe how a the cluster of two computers operates. Also, an equation 32 on page 316 expresses the operability of a computer in the cluster whose state diagram corresponds to FIG. 5.19c.
However, one problem with Siewiorek is that the above state diagram and equation do not account for any time which it takes to switch the responsibility for handling requests to use the application programs from one computer to another. In particular in FIG. 5.19c, there is no failover state. Thus Siewiorek only addresses when a computer in a cluster is operable, and does not address when an application program is available for use.
Another problem with Siewiorek is that it only accounts for a single source of stoppage that occurs at a single rate ".lambda." which has a single repair rate ".mu.". This however, is unrealistic because in an actual cluster of several computers, each application program can become unavailable due to a hardware stoppage or a software stoppage or a system administrator stoppage which occur at different frequencies with different repair times.
Still another problem with Siewiorek is that the expression for the operability of a cluster of two computers, as given by equation 32, is quite complex. In addition, other expressions for the operability of a cluster of more than two computers, as provided by Siewiorek, are even more complex. This is evident from page 839 of the Siewiorek wherein the third formula from the top of the page applies to a cluster of N computers and is extremely complex.
Accordingly, a primary object of the present invention is to provide a method for estimating the availability of application programs in a cluster of computers by which the above problems are overcome.