The emergence of cloud-computing resource providers and management tools for private virtualization clusters has allowed virtualized applications to be deployed on resources that may be changed or re-provisioned on an as-needed basis. For example, a developer who knows that his or her deployed application will receive only modest workloads may choose to run the application on an instance having allocated only a modest amount of resources. As time goes on, however, the developer may discover that the application is now receiving larger workloads and may consequently decide to upgrade larger instance and/or create a cluster of a plurality of small instances behind a load balancer. Should demand fall in the future, the developer may downgrade back to the single, small instance. The ability to provision and re-provision compute resources is thus a fundamental benefit of cloud computing and of virtualization in general; it allows one to ‘right-scale’ an application so that the resources upon which it is deployed match the computational demands it experiences and thus avoid paying for un-needed resources.
The task of right-scaling a virtualized application is, however, difficult in practice. For example, scaling an application based on instantaneous demand (sometimes called autoscaling) is often an inappropriate resource-allocation scheme; amongst other things, such scaling does not allow the advance provision of resources, and it is necessarily blind to long-term usage patterns. Consequently, it is desirable to have a resource-provisioning plan having a longer horizon. The task of forecasting an application's usage patterns, however, even when restricted to simply extending performance-metric time series (e.g., percent CPU utilization over time, disk IO over time, etc.), may be a labor-intensive process. Furthermore, even if performance metrics could be predicted in advance, one still faces the problem of translating this knowledge into intelligent decisions to upgrade or downgrade a deployment. For example, suppose someone has an application running on three small instances, and that he or she knows with certainty that over the next month, these instances will respectively run constantly at 20%, 60%, and 80% CPU utilization. If, say, the application is instead deployed on two medium instances, these CPU utilization numbers would change unpredictably. In other words, it may be difficult to determine how performance metrics on one deployment will translate into performance metrics on another deployment of the same application on a different resource set.
This problem may be exacerbated by the fact that performance patterns are often application-specific. For example, suppose there are two applications, A and B, and two resource deployment patterns (e.g. three small instances or two medium instances), X and Y. Suppose that on the resource pattern X, application A's memory utilization fluctuates between 70% and 80% and that on resource pattern Y, it fluctuates between 40% and 50%. Suppose further that application B's memory utilization on resource pattern X also fluctuates between 70% and 80%. Given this information, it cannot be determined with certainty that application B's utilization on Y will also be between 40% and 50%, even though application A behaved that way, because of potential differences between the needs of applications A and B. The way that an application performs on a resource is often very specific to that application, making it extremely difficult to predict how other applications will perform when deployed on different resource types. A need therefore exists for a more efficient way to scale resource allocations for applications.