“xvMotion” (also known as “Enhanced vMotion”) is a technology that allows both the execution state and the virtual disk files (VMDKs) of a virtual machine (VM) to be migrated across host systems/data stores as a single operation. This enables VM migration in environments where the host systems do not have access to shared storage (and thus rely on host-based local storage). By way of example, FIG. 1A depicts a system environment 100 that includes two host systems 102 and 104 and a virtual infrastructure (VI) server 106. Each host system 102/104 includes, in turn, a hypervisor 108/110 and a local storage component 112/114. In the example of FIG. 1A, host system 102 runs a VM 116 on hypervisor 108 that is configured to access a VMDK 118 residing on local storage component 112.
Assume VI server 106 determines that VM 116 should be migrated from host system 102 to host system 104. In this scenario, VI server 106 can use xvMotion (as shown in FIG. 1B) to transfer both the execution state of VM 116 and its persistent data (i.e., VMDK 118) across the host systems. For instance, at step (1) of FIG. 1B (reference numeral 120), VI server 106 can cause host system 102 to background copy VMDK 118 (in the form of storage snapshots) to local storage component 114 of host system 104. Once VMDK 118 has been transferred, VI server 106 can cause host system 102 to copy the execution state (e.g., volatile memory contents) of VM 116 to host system 104 (step (2); reference numeral 122). Finally, at steps (3) and (4) (reference numerals 124 and 126), VI server 106 can halt the execution of VM 116 on host system 102 and restart VM 116 on host system 104, thereby completing the migration process.
While xvMotion solves the problem of performing VM migrations without shared storage, it also suffers from a number of drawbacks. First, the amount of time needed to transfer a VMDK from a source host system to a destination host system can be relatively long (e.g., on the order of minutes or hours), particularly if the VMDK is large and/or the network connection between the source and destination host systems is slow. This means that VM load balancing mechanisms (such as VMware's Distributed Resource Scheduler (DRS)) cannot reliably use xvMotion to relieve compute/memory pressures across host systems/clusters in real-time or near real-time. For example, consider a situation where host system 102 of FIG. 1A/1B experiences a temporary spike in CPU usage, which causes a DRS component of VI server 106 to conclude that VM 116 should be quickly off loaded via xvMotion to host system 104. In this case, due to the time needed to transfer VMDK 118 across the network, VI server 106 may be unable to complete the migration of VM 116 until the CPU spike on host system 102 has already passed (thereby rendering the migration ineffective for reducing host system 102's compute load).
Second, xvMotion generally requires that the destination host system have sufficient local storage capacity to hold the VMDK(s) of the VM being migrated. This can be problematic in “storage-imbalanced” clusters—in other words, clusters where some host systems have an abundance of local storage and other host systems have little or no local storage. An example of the latter type of host system is a diskless blade server. In these clusters, the host systems with little or no local storage may have significant compute/memory resources that cannot be leveraged by DRS because the host systems do not have enough local storage capacity to act as an xvMotion destination.