This invention relates to data processing systems of the type which include a server farm that executes application programs for multiple clients (users); and more particularly, this invention relates to methods for estimating the xe2x80x9cserver farm sizexe2x80x9d with a given redundancy factor and estimating the xe2x80x9cavailabilityxe2x80x9d of the server farm in the above type of data processing systems.
The co-pending application described above involves an estimator program to perform method steps for estimating the availability of an application program that runs on any xe2x80x9cserverxe2x80x9d in a cluster of at least two servers. By xe2x80x9cavailability of an application programxe2x80x9d is meant the probability that at any particular time instance, at least one of the servers in a cluster (farm) will actually be servicing requests from external workstations able to use the application program.
In one embodiment, the so-called estimator program begins by receiving input parameters which include (i) multiple downtime periods for each computer in the cluster (farm) that occur at respective frequencies due to various downtime sources, and (ii) an application xe2x80x9cfailoverxe2x80x9d time period for switching the running of the application program from any one computer to another computer which is operable. From these input parameters, the described estimator program estimates first and second annual stoppage times, then determines the availability of the application program on the cluster of computers which is derived from the sum of the first and second annual stoppage times.
Thus, as discussed, the estimator program of the previously-described invention estimated a first annual stoppage time for the application program due solely to the concurrent stoppage of all of the computers, as a function of the ratio of a single computer virtual downtime period over the single computer virtual time between stops. Then subsequently, the estimator program was used to estimate a second annual stoppage time for the application program, due solely to the switching of running the application program from one computer to another computer as a function of the single virtual stoppage rate and the application failover time period. From this, the estimator program determined the availability of the application program on the cluster of computers by deriving the sum of the first and second annual stoppage times.
The estimator program method was based on the assumption that xe2x80x9capplication availabilityxe2x80x9d was to be determined from four factors which were:
(i) single-server hardware reliability;
(ii) maintenance, support, and service strategies;
(iii) user application and environment;
(iv) failover or system reconnection mechanism and application recovery mechanism.
The prior estimation parameters which were described in the co-pending U.S. Ser. No. 08/550,603 did not take into consideration the total number of operating server farm clients and the normal single server workload of users involved with each single server. Further, the earlier work did not provide a recommendation or estimate regarding the number of servers required in the server farm (or cluster) which would meet the customers"" performance and redundancy level requirements.
This new method involving the server farm size and availability calculations is based on (1) the single server parameters such as (a) the meantime to failure (MTTF), (b) the meantime to repair (MTTR), and (c) the single server application performance benchmarks, and (2) individual customer preferential requirements, involving (a) the total number of server farm application users and (b) a desirable redundancy level.
This new method uses the following definition of the server farm availability. This definition is the probability that a server farm provides access to applications and data for a particular minimum number of users. As soon as the server farm can not serve this particular minimum number of users, it is considered failed. When some of the users have lost connections but can reconnect to other servers and continue to work and the majority of users do not experience any interruptions in their work, the farm is not considered failed, if it can still serve this particular number of users.
A widely used approach to improve a system""s availability beyond the availability of a single system is by using server farms with redundant servers. In this case, if one of the farm""s servers fails, the xe2x80x9cunluckyxe2x80x9d users connected to this server will lose their connections, but will have an opportunity to reconnect to other servers in the farm and get access to their applications and data. If all of the xe2x80x9cunluckyxe2x80x9d users get access to their applications and data, the farm is considered xe2x80x9cavailable.xe2x80x9d If at least one of the xe2x80x9cunluckyxe2x80x9d users fails to get access to his/her applications and data, it means that the server farm""s redundancy was exhausted and the server farm is considered failed.
The parameters for MTTF and MTTR can be estimated as indicated in the cited prior U.S. Ser. No. 08/550,603 as a single computer virtual time between failures and a single computer virtual downtime period, respectively, for a particular application and user environment.
While the present invention may be shown in a preferential embodiment for a server farm that uses any workload balancing mechanism, it is not limited thereto, and can be used for any other data processing environment where the definition of the xe2x80x9cserver farm availabilityxe2x80x9d can be applied.
Thus the object of the present invention is to provide a method for estimating the xe2x80x9cserver farm sizexe2x80x9d and estimating the xe2x80x9cavailabilityxe2x80x9d of the server farm based on individual server parameters including the redundancy factor. The method will generate recommendations for a customer who requires a reliable operating environment with sufficient redundancy which must continue to serve a particular number of clients (application users) even during cases of single server failures or stops for planned maintenance.
In accordance with the present invention, a novel estimator program performs method steps for estimating the xe2x80x9cserver farm sizexe2x80x9d and estimating the xe2x80x9cavailabilityxe2x80x9d of the server farm for a given redundancy factor and a given particular number of clients. By the availability of a server farm is herein meant the probability that the server farm will service not less than a given particular number of clients.
In one particular embodiment, the estimator program begins by receiving input parameters which include a redundancy factor (or a normal single server workload of users), a particular number of clients xe2x80x9cnxe2x80x9d for utilizing the server farm, and parameters of single servers that form the server farm.
The first single server parameter is the server performance benchmark. A benchmark value is based on the user workload simulation for a particular type of server. The second single server parameter is the server mean time between failures (MTTF) that is a measure of the single server reliability. The next single server parameter is the mean time to repair (MTTR) that is a measure of the server maintainability and/or serviceability.
Next, the estimator program uses the input parameters to generate an estimate of the farm size for the farm that will comprise servers with given parameters and will be able to serve at least a given particular number of clients xe2x80x9cnxe2x80x9d. Then, the estimator program establishes the server reserved number of servers and provides an estimate of the server farm MTTF utilizing a Markov algorithm. Next, the estimator program estimates the availability and corresponding downtime for the server farm from the server farm MTTF and the server farm MTTR that is assumed equal to the single server MTTR.
As a result, the estimator program provides the estimations of the two very important parameters that determine the cost analyses of the investment into the highly available server farms. The application of the novel estimator program demonstrates that the server farms with higher redundancy factor (more servers and, accordingly, higher the server farm cost) deliver less server farm downtime (less downtime related to business losses, and, accordingly, less downtime cost).