A computing system may be defined as a system, having a memory and processing capability, which carries out various tasks and processes various requests. Such requests may range from a simple data request, such as accessing a database over a network, to more complex ones, such as a request for running an application, which, in turn, may require some data (in databases) in order to process the original request.
Any computing system that receives requests (or tasks) and processes them has certain limitation on its capacity. There is a bound on the maximum number of requests it can process simultaneously. Such a limitation may arise due to the design and capacity of the computing system. Thus, parameters such as CPU speed, memory capacity, network connection speed etc. on which the system is deployed—limit the number of requests that can be handled by the system. Therefore, if the number of requests exceeds this limit, there will be certain requests that will not be completely processed. There will be some requests that will be dropped before reaching the computing system. These requests will not be received by the computing system at all. Also, there will be some requests that will be received by the computing system, but these will only be processed partially before they are aborted. This usually happens due to lack of memory or timeout of a network connection.
One example of such a system may be an Internet-based online stock trading service. Such a service provides information about the prices of specific stocks to its users on user's requests. However, with growing number of users of such online Internet services, the number of user requests handled concurrently by a server of the online Internet service becomes unmanageable some times. For instance, server of an online Internet service, which provides specific information about basketball, gets overloaded with user requests during the finals of a national level basketball tournament. Indeed, there would be some user requests that will only be partially processed, and some requests that will not be processed at all due to the overload of requests. These requests, which are not processed completely, or not processed at all are also regarded as dropped.
The request is regarded as dropped in case any transaction that is critical for the complete processing of the request itself is unprocessed. Suppose a request B generated by partial processing of a request A is critical for complete processing of request A. However, if request B is dropped (say, due to overload of requests on component that processes request B), request A cannot be completely processed. In such a case, request A is regarded as dropped as well.
Thus, to summarize, a request is regarded as dropped if (i) prior to the processing —partial or full—of the request by the system or a system component, it is eliminated from any future consideration for processing (i.e. is completely dropped) due to request overload on the component; (ii) any transaction that is critical for the complete processing of the request is aborted; (iii) another request, which is generated out of the processing of the original request and whose complete processing is critical for the complete processing of the original request is dropped.
One way of addressing the problem of request drops due to overloading is to use multiple replicas of the same system (i.e. multiple servers) to process incoming requests. Thus, a cluster of servers may be used instead of a single server to process the requests. A load balancer is used in front of the cluster to receive requests and ensure fair distribution of requests among the replicas. Load balancers are hardware devices or software programs that distribute user requests evenly among various components of the cluster in order to prevent overloading of components. For instance, a load balancer may be used to distribute user requests among a number of replicas of a server (such as http1 servers). This distribution may be based on predefined criteria, such as workload on each component, content type of the request (video or text), geographical location of components and expected QoS and SLA. Furthermore, there may be predefined method/technique of distributing these requests among similar components as will be discussed later.
With use of multiple instances of same server, there is a reduction in the number of requests processed by each server. This, in turn, reduces the chances of overloading of a server. However, the number of requests in certain cases may be so high that it would be impractical to have multiple servers to meet the requirements. With increased network transmission speeds and increased number of users/clients, the transmission of requests over a network is much faster than the rate of processing requests at a component and/or the rate of forwarding request by a load balancer. This causes an accumulation of requests at a component/load balancer and thus overloads the component/load balancer. This leads to request drops at the component/load balancer. In some other cases, a component starts processing a request, but due to high load conditions, the processing does not complete within a pre-defined period. This may also lead to request drops.
Request drops adversely affect the performance of a computing system. First, there is wastage of resources—resources of load balancer(s) that processed and forwarded the request and/or resources utilized for the partial processing of the request. This leads to reduction in resource utilization. Second, a request, upon being processed, may result in a change of state in some of the components (that have processed the request). If the request is dropped at an intermediate component, the system may have to be rolled back to its most recent consistent state. In other words, the whole system has to be brought back to the state that the system was in, before the processing of the dropped request had started. Each rollback worsens the response time of the system in processing requests, reduces the throughput of the system and also wastes system resources. Third, wastage of system resources may cause dropping of other requests, which otherwise could have been processed, due to non-availability of sufficient system resources. Fourth, for the same reason as stated above, some other requests would be processed with a lower quality of service, which otherwise could have been processed at a better quality of service. Fifth, the time wasted for partial processing of a dropped request adds to the user response time for dropped or failed requests. This may hamper the Quality of Service (QoS) provided to the user and may also lead to the violation of Service Level Agreements (SLA) between the user and the network. Finally, user dissatisfaction increases with higher response time for processing of requests.
In light of the abovementioned disadvantages of request drops, it is imperative to reduce the number of request drops. There are a number of methods and techniques to use load balancers in order to reduce the request drops. One exemplary method for carrying out the distribution of user requests by a load balancer is the Round Robin Domain Name Server (RR-DNS) approach. In this method, the requests are routed to multiple servers one by one on a rotational basis (that is in a round robin fashion).
There are some other load-balancing solutions that are based on the idea of flow of information among individual components in a cluster (of similar components) in order to determine the target component for a particular user request. This information may relate to the load levels of each component in the cluster whose load is being balanced by the load balancer.
One such solution is disclosed in U.S. Pat. No. 5,938,732, titled “Load balancing and fail over of network services”. It deals with maintenance of communication within a cluster of components and coordination of the cooperation amongst them. This is done to ensure that the service provided by that group remains available, even if one or more components providing the service become unavailable from time to time, such as through failure. Each processing element periodically sends a control message to all other processing elements within the group helping to maintain the communication. This message consists of the status of the sending components as well as data about the perceived status of the other components within the group.
Another solution has been proposed in research paper titled “Flow Control and Dynamic Load Balancing in Time Warp” authored by Myongsu Choe and Carl Tropper. It discloses an algorithm that integrates flow control and dynamic load balancing. Flow control is employed by processors in a cluster (of processors) to share or distribute the load amongst them. Flow control in this case is among the target components.
The research paper titled “The Design and Performance of an Adaptive CORBA Load Balancing Service” authored by Ossama Othman, Carlos O'Ryan, and Douglas C. Schmidt also proposes a solution. This paper, in its future works, raises the need to employ flow control among various load balancers to determine the target replica (component) from a given cluster (of components).
All the solutions described above use information exchange among various components in a cluster (of similar components) to determine the target component that would process the request(s). However, none of them address the problem of request drops in systems that intend to work at high request rates. Requests that are finally dropped before their completion are also processed, thereby causing wastage of resources and increasing the processing time and power consumption. This results in lower system efficiency. These solutions also have the usual disadvantages of request drops that have been described above.
Therefore, in light of the discussion above, there exists a need for a system and method that can reduce the number of request drops in a computing system. More specifically, there is a need for a system and method for reducing the number of requests dropped at the load balancers. There also exists a need for a system and method for facilitating Quality of Service (QoS) and Service Level Agreements (SLAs) enforcement at load balancers. There also exists a need for a load-balancing framework for increasing the throughput of system. This is required for reducing the wastage of system resources.