Several leading technology organizations are investing in building technologies that sell “software-as-a-service”. Such services provide access to shared storage (e.g., database systems) and/or computing resources to clients or subscribers. Within multi-tier e-commerce systems, combinations of different types of resources may be allocated to subscribers and/or their applications, such as whole physical or virtual machines, CPUs, memory, network bandwidth, or I/O capacity.
Every system that provides services to clients needs to protect itself from a crushing load of service requests that could potentially overload the system. In general, for a Web service or remote procedure call (RPC) service, a system is considered to be in an “overloaded” state if it is not able to provide the expected quality of service for some portion of client requests it receives. Common solutions applied by overloaded systems include denying service to clients or throttling a certain number of incoming requests until the systems get out of an overloaded state.
Some current systems avoid an overload scenario by comparing the request rate with a fixed or varying global threshold and selectively refusing service to clients once this threshold has been crossed. However, this approach does not take into account differences in the amount of work that could be performed in response to accepting different types and/or instances of services requests for servicing. In addition, it is difficult, if not impossible, to define a single global threshold that is meaningful (much less that provides acceptable performance) in a system that receives different types of requests at varying, unpredictable rates, and for which the amount of work required to satisfy the requests is also varying and unpredictable. While many services may have been designed to work best when client requests are uniformly distributed over time, in practice such temporal uniformity in work distribution is rarely encountered. Furthermore, in at least some environments, workloads may be non-uniform not only with respect to time, but also non-uniform with respect to the data set being operated upon—e.g., some portions of data may be accessed or modified more frequently than others. Service providers that wish to achieve and retain high levels of customer satisfaction may need to implement techniques that deal with workload variations in a more sophisticated manner.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.