Cloud computing with its flexible pricing model and elasticity has gained immense popularity in recent years. The cloud's pay-as-you-use model makes it easy for clients to incrementally pay for resources as opposed to making large upfront investments as may be needed to handle future workloads. Moreover the elastic nature of cloud service providing enables users to acquire and release resources as the demand changes. By provisioning just the right amount of resources service providers avoid both the cost in-efficiency of over-provisioning as well as the adverse impact on application SLAB (Service Level Agreements) due to under-provisioning. Workload surges are handled by deploying additional resources (e.g. spinning up Virtual Machines) and high utilization is maintained by de-provisioning unnecessary resources (e.g. spinning down Virtual Machines).
Clouds strive for automatic scaling of application resources in the face of changing demands. Ideally this is only possible when resources can be instantaneously provisioned and deployed as needed. All the needed Virtual Machines (VMs) have to be in a ready state at the time of the demand surge for otherwise the applications may experience a period of poor performance awaiting the provisioning of additional VMs. However this can be challenging since on-demand provisioning of VMs is inherently a slow process taking anywhere from many seconds to order of minutes.
VM provisioning delay may include the time                a) for the scheduler to find the right physical machine to host the VM;        b) to transfer the Virtual Machine Image (VMI) from a storage node to a compute node over the network;        c) to decompress the VMI at the compute node (compression of VMI done to save on storage costs);        d) for resource allocation and configuration of the VM (e.g. network provisioning);        d) for booting up the VM by reading data from the VMI (OS dependent boot time, including time to start all services).        
VM provisioning time may be reduced by optimizing any or all of these steps.
Many Cloud Providers (e.g. AWS) support the option to create VMs from templates. A template is a pre-configured reusable VM image that includes virtual hardware components (virtual CPU (vCPU), virtual Network Interface Card (vNIC), etc.), an installed guest operating system (OS) and software application(s).
Booting up from a template can be much faster since it avoids the costly steps of configuring the VM, installing its OS and applications. Furthermore if the template is already in the memory of the host machine then the delays to access and possibly transfer the underlying image data from the disk, and over the network, are also eliminated.
However a challenge is that once a VM is booted up from a template it cannot be used for other VMs. This can be avoided by making a copy of the template just before using it for booting a VM. However this can introduce additional delays during VM provisioning.
An alternative is to keep multiple cloned disk or memory copies of templates. However, since the templates are large image files this may only be cost effective if the number of cloned copies is kept within a manageable number.
Most techniques for speeding up on-demand provisioning of VMs make use of pre-provisioned VMs or their images. This can require dedicating resources (e.g. disk or memory) for storing VM images, and (compute, memory and power resources) for maintaining running VMs. The cost for this can become prohibitive if all VMs or VM images are kept for all possible current and future resource configuration needs for each individual application hosted in the cloud.
It therefore becomes necessary to optimize the set of pre-provisioned configurations of VMs or their images needed to satisfy the rapid auto-scaling requirements of cloud applications. This set must hold, at any time, for any application, the right set of pre-provisioned configurations of VMs or their images using which the applications demand surge can be quickly met with little or no over or under provisioning.
Many pre-provisioned configurations can be reused across multiple applications as there can be many applications whose resource needs, although not exactly the same, may be close enough to be satisfied by the same VM configuration. Thus, possibly only a few different VM configurations may be sufficient to satisfy the cloud applications collective auto-scaling needs, although doing so will likely incur some quantity of over-provisioning overhead.
Therefore, it would be useful to have a method which could facilitate solution of the problem of speeding up on-demand VM provisioning for auto-scaling in the cloud.