Cloud services allow for the on-demand provision and utilization of computing resources, such as processing resources, storage resources, database resources, and communication resources. Cloud services also allow customers to purchase computing resources on a continual or as-needed basis. The capacity of purchased computing resources can also be scaled as needed. In this manner, the capacity of resources purchased by a cloud services customer can be scaled on-demand and the customer pays only for the utilized capacity.
Auto scaling is one mechanism for scaling cloud computing resources in response to increases or lulls in demand for the resources. Auto scaling allows cloud services customers to automatically scale cloud capacity according to conditions they define. For instance, rules may be defined for scaling up capacity in a particular manner in response to the occurrence of specified conditions, such as a spike in demand. Similarly, rules might also be defined to scale down capacity in a particular manner in response to the occurrence of other conditions, such as a lull in demand.
Some cloud services customers using auto scaling might specify that computing resources be operated at a relatively low percentage of their maximum operating capacity. By operating computing resources at a relatively low percentage of their maximum operating capacity, demand can be rapidly reallocated to existing computing resources if an event occurs (“a capacity event”) that causes a number of a customer's computing resources to fail.
As an example, a cloud services customer might operate a fleet of 300 virtual machine instances (“instances”) spread equally across three data centers (100 instances each). The customer might also define auto scaling rules specifying that new instances be created when the average processor utilization of existing instances reaches 60%. In the event of a failure of one of the three data centers, the load served by instances in the failed data center will be rapidly reallocated to the two surviving data centers. Consequently, the average processor utilization of the instances executing in the surviving data centers at the time of the failure will likely climb above 60%. As a result, the auto scaling mechanism described above will cause additional instances to be generated in the surviving data centers until the average processor utilization of these instances returns to 60%.
The scenario described above illustrates several shortcomings of typical auto scaling implementations. First, operating computing resources at a relatively low percentage of their maximum operating capacity is inefficient because significant computing resources may be unused on a day-to-day basis. Second, sufficient computing resources might not be available to handle a spike in demand resulting from a capacity event. Consequently, cloud services customers may be very frustrated in the event that needed computing resources are unavailable when a capacity event occurs.
It is with respect to these and other considerations that the disclosure made herein is presented.