Several leading technology organizations are investing in building technologies that sell “software-as-a-service”. Such services provide access to computing and/or storage resources (e.g., storage devices providing either a block-level device interface, or a web service interface) to clients or subscribers. Within multi-tier e-commerce systems, combinations of different types of resources may be allocated to subscribers and/or their applications, such as whole physical or virtual machines, CPUs, memory, network bandwidth, or I/O capacity. Block-level storage devices implemented at storage service may be made accessible, for example, from one or more physical or virtual machines implemented by another service.
Every system that provides services to clients needs to protect itself from a crushing load of service requests that could potentially overload the system. In general, a system is considered to be in an “overloaded” state if it is not able to provide the expected quality of service for some portion of client requests it receives. Common solutions applied by overloaded systems include denying service to clients or throttling a certain number of incoming requests until the systems get out of an overloaded state. Such techniques may for example be employed at storage servers in some embodiments on a per-storage-device level.
Some current systems avoid an overload scenario by comparing the request rate with a fixed global threshold and selectively refusing service to clients once this threshold has been crossed. However, it is difficult, if not impossible, to define a single global threshold that is meaningful (much less that provides acceptable performance) in a system that receives different types of requests at varying, unpredictable rates, and for which the amount of work required to satisfy the requests is also varying and unpredictable in at least some cases. While many services may have been designed to work best when client requests are uniformly distributed over time, in practice such temporal uniformity in work distribution is rarely encountered. Service providers that wish to achieve and retain high levels of customer satisfaction may need to implement techniques that deal with temporal and spatial workload variations in a more sophisticated manner.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.