The present invention relates to virtual machine migrations.
Virtual machine (VM) migration is nowadays being increasingly utilized in data centers and clouds to facilitate the management of computing systems. However, the migration usually requires a considerable amount of system resources such as the network bandwidth. In the case of multiple simultaneous migrations, which also happens regularly in data center operations, such resource demands will increase dramatically and are difficult to be satisfied immediately. As a results, the performance of the system will be significantly impacted. In the worst case, the system may crash due to the shortage of resources.
Live VM migration is being widely utilized in virtualized data centers and clouds due to its capability of maintaining high system performance under dynamic workloads. However, VM migration requires considerable network bandwidth and other resources, which may in consequence lead to performance degradations of the migrating VM during the period of migration. While that resource demand and VM performance drop are usually affordable for a single VM migration due to the short period of that process, it is challenging to manage multiple concurrent migrations because the system may not have enough resources immediately to meet the dramatic resource demands from many VMs. As a result, it will take much longer time for multiple migrations to complete, which leads to long performance degradations for those VMs. To this end, this paper investigates the behavior of concurrent VM migrations, and proposes a solution to schedule multiple migrations appropriately for the avoidance of adverse impacts caused by resource shortages.
In general, multiple VM migrations show up regularly in real system operations. For instance, if some physical machines need to be removed from service for maintenance, all the VMs in those machines have to be migrated to other places. Since applications are nowadays comprised of many VMs distributed across several machines for the purpose of load balancing and fault tolerance, the workload surge in an application may require the rearrangement of several VM instances in the system. An even worse situation is that some system faults such as configuration mistakes may trigger a large number of VM migrations. In those cases, it is important to handle concurrent VM migrations in an effective way, so that they can be completed as fast as possible and hence the total performance degradation time for those VMs can be minimized.
While data migrations are conducted between storage devices in which the storage 10 usually becomes the resource bottleneck, VM migration mainly moves VM memory pages between machines where the network bandwidth becomes precious for most cases. More importantly, unlike the data migration where the size of transferred data are usually fixed, the contents that need to be transferred in VM migration vary with the available network bandwidth as well as the characteristics of VM such as its memory dirty rate. This is due to the mechanism of iterative memory pre-copy implemented in most migration software, where the number of memory pages need to be transferred in each pre-copy round depends on the speed of transfer in previous rounds. For the same migration task, it may take much longer time than expected in a small bandwidth network compared with that with enough bandwidth, especially when the memory dirty rate of that VM is high.
The unique features of VM migration pose new challenges when multiple VMs request to migrate simultaneously. First, since those migrations may have overlapped links in their migration paths, the system needs to determine whether to let them share the link by initiating them concurrently, and what is the maximum number of concurrent migrations allowed in that link. The link sharing between multiple migrations can improve the overall utilization of network bandwidth due to the resource multiplexing between migration flows, and thus contribute to the quick completion of migrations. However, the amount of transferred memory pages also increases since each VM is only allocated with a portion of bandwidth in the overlapped links. A balance is needed in determining the optimal number of concurrent migrations that share the network link.
Conventional human made policies specify a max number of VM migration allowed in the network, but under real situations, network bandwidth usages and VM memory usages are always changing and such fixed policy may not satisfy all the real condition. Making such a balance is difficult because it depends on numerous factors including as the link capacity and the VM memory dirty rate. In addition, the dependency of migration performance with respect to those factors is non-linear, which is hard to be predicted by mathematical formulas.