This invention relates to data processing systems of the type which include a Server Farm that executes application programs for multiple clients (users); and more particularly, this invention relates to methods for optimizing the xe2x80x9cServer Farm sizexe2x80x9d by balancing Server Farm performance and availability requirements in the above type of data processing systems.
The referenced U.S. Pat. No. 6,334,195 entitled xe2x80x9cEstimator Program for Estimating the Availability of an Application Program That Runs in a Cluster of at Least Two Computersxe2x80x9d referenced above involves an estimator program to perform method steps for estimating the availability of an application program that rusn on any xe2x80x9cserverxe2x80x9d in a cluster of at least two servers. By xe2x80x9cavailability of an application programxe2x80x9d is meant the probability that at any particular tie instance, at least one of the servers in a cluster (farm) will actually be servicing requests from external workstations able to use the application program.
In one embodiment, the so-called estimator program begins by receiving input parameters which include (i) multiple downtime periods for each computer in the cluster (farm) that occur at respective frequencies due to various downtime sources, and (ii) an application xe2x80x9cfailoverxe2x80x9d time period for switching the running of the application program from any one computer to another computer which is operable. From these input parameters, the described estimator program estimates first and second annual stoppage times, then determines the availability of the application program on the cluster of computers which is derived from the sum of the first and second annual stoppage times.
Thus, as discussed, the estimator program of the previously-described invention estimated a first annual stoppage time for the application program due solely to the concurrent stoppage of all of the computers, as a function of the ratio of a single computer virtual downtime period over the single computer virtual time between stops. Then subsequently, the estimator program was used to estimate a second annual stoppage time for the application program, due solely to the switching for running the application program from one computer to another computer as a function of the single virtual stoppage rate and the application failover time period. From this, the estimator program determined the availability of the application program on the cluster of computers by deriving the sum of the first and second annual stoppage times.
The estimator program method was based on the assumption that xe2x80x9capplication availabilityxe2x80x9d was to be determined from four factors which were:
(i) single-server hardware reliability;
(ii) maintenance, support, and service strategies;
(iii) user application and environment;
(iv) failover or system reconnection mechanism and application recovery mechanism.
The prior estimation parameters which were described in the co-pending application U.S. Ser. No. 08/550,603 did not take into consideration the total number of operating Server Farm clients and the normal single server workload of users involved with each single server. Further, this earlier application did not provide a recommendation or estimate regarding the number of servers required in the Server Farm (or cluster) which would meet the customers"" performance and redundancy level requirements, nor did it establish an optimum farm configuration.
The method of the co-pending application U.S. Ser. No. 09/433,926, filed Nov. 19, 1999, now Allowed, entitled xe2x80x9cMethod for Estimating the Availability of an Operating Server Farmxe2x80x9d extended the area of the original method application for Server Farms designed to serve user communities with a required particular number of customers xe2x80x9cnxe2x80x9d. This method involving the Server Farm size and availability calculations is based on (1) the single server parameters such as (a) the meantime to failure (MTTF), (b) the meantime to repair (MTTR), and (c) the single server application performance benchmarks, and (2) individual customer preferential requirements, involving (a) the total number of Server Farm application users and (b) a desirable redundancy level.
This estimation method for availability uses the following definition of Server Farm availability. This definition is the probability that a Server Farm provides access to applications and data for a particular minimum number of users. As soon as the Server Farm can not serve this particular minimum number of users, it is considered failed. When some of the users have lost connections but can reconnect to other servers and continue to work and the majority of users do not experience any interruptions in their work, the farm is not considered failed, if it can still serve this particular number of users.
A widely used approach to improve a system""s availability beyond the availability of a single system is by using Server Farms with redundant servers. In this case, if one of the farm""s servers fails, the xe2x80x9cunluckyxe2x80x9d users connected to this server will lose their connections, but will have an opportunity to reconnect to other servers in the farm and get access to their applications and data. If all of the xe2x80x9cunluckyxe2x80x9d users get access to their applications and data, the farm is considered xe2x80x9cavailable.xe2x80x9d If at least one of the xe2x80x9cunluckyxe2x80x9d users fails to get access to his/her applications and data, it means that the Server Farm""s redundancy was exhausted and the Server Farm is considered failed.
The parameters for MTTF and MTTR can be estimated as indicated in the cited prior U.S. Pat. No. 6,334,196 as single computer virtual time between failures and a single computer virtual downtime period respectively, fro a particular application and user environment.
Therefore, the availability estimation method of the prior application U.S. Ser. No. 09/443,926 allows one to estimate such parameters of the Server Farm as number of servers, Server Farm availability, and Server Farm downtime, based on a set of input data. At the same time, however, this method does not provide any recommendations about optimum combinations of the Server Farm parameters that can be chosen at the Server Farm planning or design stage.
The presently described new method involving the Server Farm size optimization is based on the input data that include single server parameters similar to the prior application U.S. Ser. No. 09/443,926 and at least two new extra parameters: single server cost and the downtime cost. Additionally, this new method includes newly added steps of selecting an optimization parameter, selecting an optimization criterion, and using an optimization technique procedure to find the optimum value of the optimization parameter.
While the present invention may be shown in a preferential embodiment for a Server Farm that uses any workload balancing mechanism, it is not limited thereto, and can be used for any other data processing environment where the definition of the xe2x80x9cServer Farm availabilityxe2x80x9d can be applied.
Thus the object of the present invention is to provide a method for optimizing the xe2x80x9cServer Farm sizexe2x80x9d by balancing Server, Farm performance and availability requirements. The method will generate an optimum recommendation for the selected set of input data, the selected optimization criterion and optimization parameter.
In accordance with the present invention, a novel estimator program performs method steps for the Server Farm optimization for a given particular number of clients xe2x80x9cnxe2x80x9d by balancing Server Farm performance and availability requirements. By the optimization of the Server Farm is herein meant the process of finding the optimum value of the selected optimization parameter that delivers the optimum value (maximum or minimum) for the selected optimization criterion and a given set of input data.
The method of optimization is based on a relationship between two major system attributes, performance and availability, that are xe2x80x9ccompetingxe2x80x9d for the same system redundant resources. The purpose of the Server Farm optimization is balancing of the business performance and availability requirements.
System performance in a Server Farm computing environment is a particular number of concurrent users with the minimum required application response time and reliable access to their applications and data. Server Farm availability is the probability that a Server Farm provides a required system performance level. A Server Farm parameter that indirectly defines the Server Farm availability and performance is a Redundancy Factor, that is a measure of the available system resources. It is a difference between maximum and nominal performance as a percentage of the maximum performance.
In one particular embodiment, the method uses a simplified Server Farm availability economic model. The model uses optimization criterion that is a total of the initial investment into xe2x80x9chighly availablexe2x80x9d Server Farm and downtime losses during the period of owning a Server Farm. The Redundancy Factor is used as an optimization parameter. Different values of the Redundancy Factor can result in different Server Farm sizes. The greater values of the Redundancy, Factor mean that more system resources are used to increase Server Farm availability and usually more redundant servers are required to provide the same required Server Farm performance.
The method uses the fact that the decrease of the downtime losses do not always justify additional investments in redundant servers. First additions of the redundant servers usually deliver better Server Farm availability or less Server Farm downtime. At some particular Redundancy Factor value and/or Server Farm size, the Server Farm availability is close to the maximum possible value. In this case, the addition of the redundant servers will not decrease Server Farm downtime enough for the additionally expanded Server Farm cost justification. This Redundancy Factor value or the Server Farm size value is the optimum value that minimizes the total Server Farm owner losses that include the initial investment plus estimated downtime losses.