In provider networks and other electronic environments, it is common for multiple users to send requests to a common resource, such as a host machine or server that is operable to process the request and perform at least one associated action. As the number of users and requests increases, the number of resources needed to handle those requests increases as well. The cost of purchasing and maintaining these resources can limit the amount of resources made available, such that there generally is a maximum number of requests that can be handled at any given time, even when the requests are distributed across multiple instances of a given type of resource. Exceeding a maximum number or rate of allowable requests can negatively impact the quality of service that users receive, as the average response time for requests might increase dramatically, requests might time out, or the system might crash or experience other problems.
One conventional solution to this problem is to limit the number of requests from a given requestor over a given period of time, commonly referred to as throttling. In some conventional approaches, a group or type of user is given a hard limit for resource requests for a particular type of resource over a period of time. There might be more than one group or type of user that each receives a different limit, such as may be based upon the price paid by that type of user. While such an approach may be effective in some situations, it can be too limiting in other situations. For example, such an approach may work well in an environment with a single host having a fixed amount of capacity, but may not be optimal in situations where resources are provided in a dynamic and distributed fashion, where the amount of available resource capacity can change over time. Further, other aspects such as the effective cost of processing a request can vary over time as well. Hard limits or fixed throttling caps do not provide any flexibility to adapt to these changing conditions.