Cloud computing refers to highly scalable networked computing systems capable of delivering elastic computing performance to numerous users. Cloud computing typically involves clusters of densely packed computing servers, called nodes, with each node potentially executing dozens of virtual machines. Typically, each node includes a hypervisor or other virtualization framework, and the entire cloud computing cluster includes one or more cloud controllers that manage instantiation of virtual machines on the particular compute nodes. OpenStack is one example of such a cloud computing framework.
In a multi-tenant cloud computing environment, different customers may control the virtual machines on a particular node. Thus, the resources of the node, such as processor, network, and storage resources, must be shared among the virtual machines and thus among different customers. When a virtual machine is created, the user selects an instance type that specifies the resource requirements of the virtual machine. Static resource requirements include a number of virtual central processing units (vCPUs), memory, disk and network.
Such requirements for a cloud service provider (CSP) has two challenges. The first challenge is that the CSP must provide the promised resources to the virtual machine instance at the same performance in any server system, regardless of the CPU and other hardware components. Thus, the CSP must define a performance metric and ensure that each virtual machine meets the performance. However, this requirement has often not been met consistently. The second challenge is that there should be a maximum use of the provisioned infrastructure. Thus, the CSP may often wish to overprovision CPU and memory to a limit that maximizes use of infrastructure and minimizes difference in performance degradation.
Existing overprovisioning solutions suggest using an optimum ratio. However when implementing such solutions performance degradation occurs when overprovisioning in instances where a virtual machine is assigned four or more vCPUs. For example if a first virtual machine (VM1) is allocated 4 vCPUs, one of which is assigned on a first physical CPU (pCPU1) and a second virtual machine (VM2) is assigned 1 vCPU which is also on pCPU1, then VM1 will have to wait for its vCPU to be free as it was shared by VM2. If VM1's 3 other vCPUs are shared with other VMs, VM1 would have to wait much longer for all its vCPUs to be free. This results in performance delays for VM1.
Such degradation is currently being remedied by not overprovisioning virtual vCPUs (e.g., starting with a single vCPU and scaling out when is necessary); monitoring workload usage, CPU ready and CPU utilization metrics and re-sizing virtual machines; and migrating a virtual machine to a server that has relatively free resources, or has not been overprovisioned. However, not overprovisioning is a generic recommendation that does not guarantee a performance Service Level Agreement (SLA) that is met on every single server system every single unit of time. Further, monitoring and re-sizing virtual machines requires a virtual machine to start slow with less vCPUs and add more, thus requiring infrastructure to be free or not fully utilized always. Moreover, migrating a virtual machine may result in the same performance delay once a destination server has full allocated capacity.