One of the important concepts behind the adoption of cloud computing is the Pay-As-You-Go model. In this model, which is currently in use by major cloud providers such as Amazon EC2 and Microsoft Azure, service providers pay only for allocated resources, and the amount of these resources can be dynamically modified. For example, paying per VM (Virtual Machine) is done only for the duration of the VM's lifetime.
However, this model places a major dilemma to the service providers, namely—how much resource to acquire? Indeed, on the one hand, a higher amount of resources leased from the cloud results in better service quality; but on the other hand, a higher amount of resources incurs higher operational expenses, as the service provider has to pay the cloud owner for the amount of requested resources. In other words, while increasing the amount of resources used by the service has the potential of increasing its income, over-provisioning may lead to decrease in revenue.
Determining the right amount of resources to lease from the cloud so as to optimize the revenue is a complicated task due to the varying rate of user requests and the complex relation among the demand, the amount of allocated resources and the quality of the service.
One of the most common mechanisms used to address this challenge is elasticity, that is, the ability to dynamically adjust the amount of the resources allocated to the service, typically VMs or storage, based on the demand for that service. This capability is used, for example, by on-line shopping service providers in order to expand their service around the end of the year when demand rises as people go on-line to do their holiday shopping; when the holiday season is over and demand drops, the service providers can scale down their service and release the resources back to the cloud. Another example, where resource allocation should be adjusted in a much shorter time frame, is a case where a large-scale disaster occurs and users log in to report their experience or check on their relatives and friends. In such a scenario, the demand for social network services may increase rapidly and unexpectedly during a short period of time, thus the amount of resources allocated to the service should be adjusted accordingly in order to maintain the desired user experience.
Typically, the dynamic adaptation of the allocated resources is accomplished by monitoring their state. However, for large cloud-based services, tracking the performance of each server or VM and monitoring each user request is often impractical. There is a long-felt-need for techniques that can provide efficient elasticity under such conditions.
Many elasticity schemes, such as those discussed in G. Galante and L. C. E. D. Bona. “A survey on cloud computing elasticity”. In Proceedings of the 5th International Conference on Utility and Cloud Computing, pages 263-270, 2012, share some fundamental aspects, namely: an elasticity controller tracks the state of the available resources and determines whether they meet the demand for the service with respect to some optimization function. The elasticity controller may determine that provisioning of more resources is required, or that some resources may be released and returned to the cloud provider. A common architecture is depicted in FIG. 1.
Elasticity controllers differ from each other in the techniques and means that they employ for performing their tasks. For example, some elasticity mechanisms, e.g. Amazon's Auto-Scale, evaluate service performance through direct hardware measurement such as CPU utilization; other mechanisms measure performance by metrics that are available only at the hypervisor/operating system/application layer. For example, in T. C. Chieu, A. Mohindra, A. A. Karve, and A. Segal. “Dynamic scaling of web applications in a virtualized cloud computing environment”. In Proceedings of the IEEE International Conference on e-Business Engineering (ICEBE), pages 281-286, 2009, the VM's load is measured through the number of open HTTP connections it has. Other means may include requests per minute, number of users that are logged on, or response time. In L. Zhang, X. P. Li, and S. Yuan. “A content-based dynamic load-balancing algorithm for heterogeneous web server cluster”, in Advances in Computer Animation and Digital Entertainment, 7(1):153-162, 2010, the authors propose a combination of the aforementioned metrics to determine the load.
Moreover, while some elasticity controllers simply consider the average job completion time, others address stricter SLA criteria. For example, M. Mao, J. Li, and M. Humphrey. “Cloud auto-scaling with deadline and budget constraints”, In Proceedings of the 11th International Conference on Grid Computing (GRID), pages 41-48, 2010, considers jobs with individual deadlines.
A different technique to handle varying demand for a service is to provision resources based on a pre-defined schedule. See RightScale at http://www.rightscale.com, last accessed May 25, 2014, and Scalr. http://scalr.net, last accessed May 25, 2014. Such elasticity controllers may have rules like “On Mondays, between 11 AM and 4 PM, have the service running on 5 VMs”. This approach is suitable when the service provider has good confidence in its ability to pre-determine the load at given times.
Authors of Z. Gong, X. Gu, and J. Wilkes. “PRESS: PRedictive Elastic ReSource Scaling for cloud systems”, in Proceedings of the International Conference on Network and Service Management (CNSM), pages 9-16, 2010 and H. Nguyen, Z. Shen, X. Gu, S. Subbiah, and J. Wilkes. “AGILE: elastic distributed resource scaling for infrastructure-as-a-service”, in Proceedings of the 10th International Conference on Autonomic Computing (ICAC), 2013, apply prediction based (centralized) techniques to determine when a new VM needs to be powered up in time to ensure that the new VM is up and running when the load rises.
The relation between the elasticity controller and the load balancer is clear. First, both mechanisms rely on data regarding the VM state for making their decisions. More importantly, when the elasticity controller determines that a VM should be released soon and returned to the cloud provider, the load balancer is required to be aware of such information in order to avoid sending new user requests to that VM. Conversely, a resource that is soon to be released is probably “attractive” to a load balancer, as it is likely to be lightly loaded. In T. C. Chieu, A. Mohindra, A. A. Karve, and A. Segal. “Dynamic scaling of web applications in a virtualized cloud computing environment”. In Proceedings of the IEEE International Conference on e-Business Engineering (ICEBE), pages 281-286, 2009, the authors assume that the load balancer complies with the instructions of the elasticity controller and is capable of migrating HTTP sessions in order to enable the release of resources.
Taking a centralized approach in implementing a load balancer or an elasticity controller may create bottlenecks and severely impact the quality of the service and its scalability. For example, H. Liu and S. Wee. “Web server farm in the cloud: Performance evaluation and dynamic architecture”, in Proceedings of the First International Conference on Cloud Computing (CloudCom), pages 369-380, 2009, reports a case where an AWS load balancer is unwilling to handle further user requests when 950 jobs are pending. Such a number may be prohibitive for social networks or search engines.
The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the figures.