A virtual machine (VM) is a software implementation of a computer that executes programs in a way that is similar to a physical machine. The virtualization technology allows the sharing of the underlying physical hardware resources between different virtual machines, each running its own operating system (as a guest). The virtualization, which is typically performed by a hypervisor, allows multiple operating systems to run concurrently on a host computer. The hypervisor presents the guest operating systems with a virtual operating platform and monitors the execution of the guest operating systems. Further, the hypervisor defines the allocation of resources (e.g., CPU power, memory, network bandwidth, etc.) for each guest operating system.
Virtualization of computing and networking resources, such as servers, application delivery controllers (ADCs), and load balancers can improve the performance of a service provider's datacenters. Further, virtualization of such resources may reduce costs and overhead to the service providers. For example, most applications executed in datacenters utilize between 5% and 10% of the resources of physical machine CPUs most of the time. However, by deploying such applications as virtual machines in one physical machine, utilization of 80% can be achieved. This can be achieved without compromising the isolation and independence of the physical machines hosting the applications. As a result, adoption of virtualization technologies in datacenters has been rapidly increasing over the last few years to the extent that it is expected that most services will soon be deployed as virtual machines (VMs).
Typically, a single physical machine is not sufficient to support multiple VMs, as in most cases the average resource consumption may exceed the capacity of one physical machine. With this aim, the VMs are distributed among several physical machines, such that the total average resource consumption of the VMs in one physical machine does not exceed a configurable threshold (e.g. 80%) of the physical machine's capacity. However, because resource consumption by VMs dynamically varies, a physical machine may be overloaded by instances of peak utilization by the VMs it hosts. Therefore, there is a need to balance the utilization of resources of physical machines by VMs hosted therein. This task is known as a workload balancing (WLB).
Prior art solutions perform the workload balancing task by a VM migration process, which is schematically illustrated in FIG. 1. The VM migration process is also referred to as a “live VM migration” because the VM is transferred to a different physical machine during its operation.
For example, a physical machine 100 executes VMs 111, 112, and 113. When it is determined that the physical machine 100 is busy (e.g., over 80% utilization) one or more VMs 111-113 are migrated to a physical machine 120 to support the additional VMs. As illustrated in FIG. 1, VMs 111 and 113 are migrated to the physical machine 120. The VM migration process may be triggered by a user (e.g., a system administrator) or by a virtual machine controller 140 that monitors the performance of the datacenters. In both cases, the migration of VM machines is coordinated by the controller 140 and performed by the source and destination physical machines. Typically, the VM migration process requires that both the source physical machine (e.g., machine 100) and the target physical machine (e.g., machine 120) share the same storage 130 where the VM file-system (VMFS) resides. The controller 140 instructs the physical machine 100 to migrate VM 111 to the physical machine 120.
The VM migration process is performed by incrementally copying the CPU state and the memory image of the VM (e.g., VM 111), including the content of its registers, from the source physical machine to the target physical machine. Once the memory image has been copied, the execution of the VM on the source physical machine is halted, and execution then resumes on the target physical machine. The execution of the VM on the target physical machine 120 is resumed from the next instruction subsequent to the instruction step in which it was stopped.
Specifically, considering that the VM 111 on the source physical machine 100 runs during the migration process, the execution is switched over to the target machine 120 only when there is a small “delta” memory (the difference between “source” and “target” memory images) that needs to be copied. The delta memory is typically a pre-defined and configurable parameter (e.g., a number of memory pages).
The live VM migration also requires migrating network connections from the source to the target physical machines. Typically, the source and target physical machines are in the same IP subnet. Thus, when the VM is migrated to the target physical machine 120, the VM broadcasts address resolution protocol (ARP) messages indicating that the IP address has moved to a new physical location. As the VMFS resides on the shared storage 130, there is no need to synchronize large amounts of persistent data and the migration can be done while turning the VM off for a very short period of time.
However, the conventional live VM migration process is inefficient as it limits the performances of both the VMs and physical machines. Specifically, the conventional VM migration process suffers from the following drawbacks. First, the service throughout the VM is significantly degraded during the migration period, because the incremental memory replication of the VMs consumes CPU and network resources. Such degradations of service throughout can cause a temporary discontinuity of service at the switch-over point in time. Even though such a discontinuity period may be short (e.g., up to a second), for mission critical applications, this cannot be tolerated. Moreover, the application performance degradation time can be much longer, which is also cannot be tolerated in mission critical applications.
In addition, the conventional live VM migration process consumes CPU and network resources used for copying the VM's memory image, thereby aggravating the situation on the congested physical machine. Moreover, the copying of the memory image from one machine to another may not be converged. This can happen, for example, when the source VM memory constantly changes in such a way that the delta memory state cannot meet the threshold set for the switch over. As a result, the process of moving the VM's state consumes computing resources required for the migration process, thereby degrading the performance of the source physical machines. This may also cause the live migration process to fail as it would take too long to move the VM from one machine to another.
The primary purpose for performing VM migration is to improve the performance and utilization of datacenters, but the conventional live VM migration approach cannot guarantee these objectives. In fact, conventional live VM migration processes may sacrifice the performance of datacenters or result in an underutilized datacenter. Thus, the conventional VM migration processes are an inefficient approach for workload balancing.
It would be therefore advantageous to provide a solution that would resolve the shortcomings of prior art techniques for workload balancing.