1. Technical Field
The present invention provides an improved computing system in which load balancing weights are determined using application instance data such as success ratios, response times, service class levels, and transaction topology. More specifically, the present invention relates to distributing traffic across a collective group of machines based on the current and previous knowledge gathered about the applications receiving and processing the traffic on those machines.
2. Description of Related Art
Load Balancers distribute load to a collection of machines to provide extended scalability and availability to applications. As a service becomes more popular, its ability to serve a larger number of requests becomes critical. One solution to this problem is to create copies of the service to which some of the requests can be sent. These copies also distribute the point of failure for an application because if one copy of the service fails, another may be available. The load balancer's job is to distribute the load across all available copies. The method in which the load balancer chooses to distribute the load can make a big difference on the overall effect seen by users of the service and the efficiency of the servers being used. Current methods used by load balancers to distribute the load include purely algorithmic methods, system-metric based methods, and “application-alive” checks.
The purely algorithmic methods include no information about the actual service or the likelihood that the service could complete the task in a particular time period (or complete the task at all). An example of some of these methods are round robin, weighted round robin, least connections, and hash methods. The round robin approach simply distributes the requests evenly to each server application instance, e.g. send the first request to the first server and then send the second request to the second server, wrapping around to the first server again once all of the servers have had a request sent to them. The weighted round robin approach is the same as the round robin approach but with static weights associated with particular servers to give preference to those servers. The least connections approach involves sending the new request to the server with the least number of open connections. The hash methods approach involves sending all requests which match some regular expression to a certain server.
System-metric based methods use statistics about the system on which the server is running in its decision making process. This information is useful, but may not be descriptive enough as the service itself may not be contributing to the system statistics in an intuitive manner. Other applications could distort the image portrayed by system-level statistics (e.g. CPU usage may be low because the load balanced application is waiting on a resource currently taken by another application). Even if the entire system only runs a single application, system statistics could paint an inaccurate picture of the current application state because the application dependencies may not be understood by the system. For example, the CPU usage and response time could also be low because the application is stuck in an error state and simply returns an “out of service” reply to every request. This problem, known as the “Storm Drain Problem,” can be especially tough on a load balancing algorithm with no application-level statistics because the application appears to be functioning perfectly to the underlying system.
Application-alive checks determine if the application is functioning as part of the load balancing operation. For example, load balancers will not send traffic to an application that has either died or has malfunctioned. While these checks are application specific, they give no indication as to the degree of health the application may be experiencing besides dead or alive. In that sense, they offer no basis to compare two functioning servers to determine which server to send work to.
Thus, some of the known load balancing mechanisms use methods that have no relevance to the performance of the applications servicing the request. The other load balancing mechanisms, while loosely relevant, do not provide clear indications about how well an application has been, and will be, suited to handling particular requests. For example, the number of connections and other generic application attributes may be the affects of another application or part of the system. Generic resource usage statistics may also be misleading. For example, it may be desirable to send requests to a machine with high CPU usage, which would normally not be selected as the machine to which the request should be sent, if the machine's less important work were to be interrupted so that the request is to be processed.
Lastly, in the competitive market of application services, the system level and other non-application specific data used in current load balancing solutions do not constitute the type of monitoring necessary for business-level goals. For example, an SLA may have an agreement for transactions to successfully complete in a certain amount of time, or some compensation will be awarded. In this example, simple application-alive checks will not suffice. Thus, it would be beneficial to have an apparatus and method for performing load balancing based on weights that identify the best server/application instance(s) that is specific to the application instance and is not based on generic resource usage statistics.