Several leading technology organizations are investing in building technologies that provide customers with access to computing resources. Such services provide access to computing and/or storage resources (e.g., storage devices providing either a block-level device interface, or a web service interface) to customers or subscribers. Within multi-tier ecommerce systems, combinations of different types of resources may be allocated to customers and/or their applications, such as whole physical or virtual machines, CPUs, memory, network bandwidth, or I/O capacity. Block-level storage devices implemented at storage service may be made accessible, for example, from one or more physical or virtual machines implemented by another service.
Computer systems that provide services to customers may employ various techniques to protect the computer systems from a number of service requests that could potentially overload the computer systems. In general, a computer system is considered to be in an “overloaded” state if it is not able to provide the expected quality of service for at least some portion of customer requests it receives. Common solutions applied in connection with overloaded computer systems include denying service to customers or throttling a certain number of incoming requests until the computer systems leave an overloaded state. Such techniques may for example be employed at storage servers in some embodiments on a per-storage-device level.
Some current computer systems avoid an overload scenario by comparing the request rate with a fixed global threshold and selectively refusing service to customers once this threshold has been crossed. However, it is difficult to define a single global threshold that is meaningful (much less that provides acceptable performance) in a computer system that receives different types of requests at varying, unpredictable rates, and for which the amount of work required to satisfy the requests is also varying and unpredictable in at least some cases. While many services may have been designed to work best when client requests are uniformly distributed over time, in practice such temporal uniformity in work distribution is rarely encountered. Computing resource service providers that wish to achieve and retain high levels of customer satisfaction may need to implement techniques that deal with temporal and spatial workload variations in a more sophisticated manner.