The co-pending application “Estimator Program for Estimating the Availability of an Application Program That Runs in a Cluster of at Least Two Computers” referenced above involves an estimator program to perform method steps for estimating the availability of an application program that runs on any “server” in a cluster of at least two servers. By “availability of an application program” is meant the probability that at any particular time instance, at least one of the servers in a cluster (farm) will actually be servicing requests from external workstations able to use the application program.
In one embodiment, the so-called estimator program begins by receiving input parameters which include (i) multiple downtime periods for each computer in the cluster (farm) that occur at respective frequencies due to various downtime sources, and (ii) an application “failover” time period for switching the running of the application program from any one computer to another computer which is operable. From these input parameters, the described estimator program estimates first and second annual stoppage times, then determines the availability of the application program on the cluster of computers which is derived from the sum of the first and second annual stoppage times.
Thus, as discussed, the estimator program of the previously-described invention estimated a first annual stoppage time for the application program due solely to the concurrent stoppage of all of the computers, as a function of the ratio of a single computer virtual downtime period over the single computer virtual time between stops. Then subsequently, the estimator program was used to estimate a second annual stoppage time for the application program, due solely to the switching for running the application program from one computer to another computer as a function of the single virtual stoppage rate and the application failover time period. From this, the estimator program determined the availability of the application program on the cluster of computers by deriving the sum of the first and second annual stoppage times.
The estimator program method was based on the assumption that “application availability” was to be determined from four factors which were:                (i) single-server hardware reliability;        (ii) maintenance, support, and service strategies;        (iii) user application and environment;        (iv) failover or system reconnection mechanism and application recovery mechanism.        
The prior estimation parameters which were described in the co-pending application U.S. Ser. No. 08/550,603 did not take into consideration the total number of operating Server Farm clients and the normal single server workload of users involved with each single server. Further, this earlier application did not provide a recommendation or estimate regarding the number of servers required in the Server Farm (or cluster) which would meet the customers' performance and redundancy level requirements, nor did it establish an optimum farm configuration.
The method of the co-pending application U.S. Ser. No. 09/443,926, entitled “Method for Estimating the Availability of an operating Server Farm” extended the area of the original method application for Server Farms designed to serve user communities with a required particular number of customers. This method involving the Server Farm size and availability calculations is based on (1) the single server parameters such as (a) the meantime to failure (MTTF), (b) the meantime to repair (MTTR), and (c) the single server application performance benchmarks, and (2) individual customer preferential requirements, involving (a) the total number of Server Farm application users and (b) a desirable redundancy level.
This estimation method for availability uses the following definition of Server Farm availability. This definition is the probability that a Server Farm provides access to applications and data for a particular minimum number of users. As soon as the Server Farm can not serve this particular minimum number of users, it is considered failed. When some of the users have lost connections but can reconnect to other servers and continue to work, and the majority of users do not experience any interruptions in their work, the farm is not considered failed, if it can still serve this particular number of users.
A widely used approach to improve a system's availability beyond the availability of a single system is by using Server Farms with redundant servers. In this case, if one of the farm's servers fails, the “unlucky” users connected to this server will lose their connections, but will have an opportunity to reconnect to other servers in the farm and get access to their applications and data. If all of the “unlucky” users get access to their applications and data, the farm is considered “available.” If at least one of the “unlucky” users fails to get access to his/her applications and data, it means that the Server Farm's redundancy was exhausted and the Server Farm is considered failed.
The parameters for MTTF and MTTR can be estimated as indicated in the cited prior U.S. Ser. No. 08/550,603 as a single computer virtual time between failures and a single computer virtual downtime period, respectively, for a particular application and user environment.
Therefore, the availability estimation method of the prior application U.S. Ser. No. 09/443,926 allows one to estimate such parameters of the Server Farm as number of servers, Server Farm availability, and Server Farm downtime, based on a set of input data. At the same time, however, this method did not provide any recommendations about optimum combinations of the Server Farm parameters that can be chosen at the Server Farm planning or design stage.
The method of the co-pending application U.S. Ser. No. 09/474,706, entitled “Method for Server Farm Configuration Optimization” involving the Server Farm size optimization is based on the input data that includes single server parameters similar to the prior application U.S. Ser. No. 09/443,926 and at least two new extra parameters which include: single server cost and the downtime cost. Additionally, this method included newly added steps of selecting an optimization parameter, selecting an optimization criterion, and using an optimization technique procedure to find the optimum value of the optimization parameter. Therefore the optimization method of the co-pending application U.S. Ser. No. 09/474,706 allows one to find the optimum configuration of the single Server Farm. At the same time, however, this method had some technical limitations. The original assumption, that the reliability of the workload balancing mechanism is much higher than the reliability of a single server running applications becomes questionable if the Server Farm size exceeds 100-120 servers. This peculiar problem arises in the situation where a very large number of users L, say 10,000, 50,000 or 100,000 users are involved.
The presently described new method improves Server Farm availability by using so called Metafarms divided into several Server Farms with a workload balancing mechanism that distributes workload as between the Server Farms and as between the servers that make up the Server Farms. The Metafarm availability definition is similar to the Server Farm availability definition. The Metafarm availability is the probability that a Metafarm provides access to applications and data for a particular minimum number of users. Therefore this is the probability that all Metafarm Server Farms are available.
The Metafarm design unlike Server Farm design may involve a new parameter—the number of Server Farms of equal size that make up a Server Metafarm. The number of Server Farms can be used as one of the Server Metafarm optimization parameters.
While the present invention may be shown in a preferential embodiment for a Metafarm that uses any workload balancing mechanism, it is not limited thereto, and can be used for any other data processing environment where the definition of the “Metafarm and Server Farm availability” can be applied.
Thus the object of the present invention is to provide a method for optimizing the Server Metafarm for delivery of service to this large number of users by balancing a number of Server Farms and their availability requirements. The method uses the fact that the decomposition of the Metafarm into several Server Farms can increase the Server Metafarm availability. The method will generate an optimum recommendation for the selected set of input data. For example, the input data can include parameters of the particular server that is used as a Metafarm building block and Server Metafarm availability can be selected as optimization criterion. In this case, the optimum Server Metafarm configuration can include the values of the number of Server Farms of equal size that make up a Server Metafarm and the number of servers in each Server Farm.
The method of generating such optimum configurations for this type of network involving large number of users “L” is described and developed in the succeeding exposition hereinbelow.