Virtual machines can be migrated online by copying memory from one system to another while a virtual machine continues to run. Portions of memory become “dirty” during the copy so the process is repeated until, eventually, the workload and/or virtual machine is stopped so a final copy can be done. Then the workload or virtual machine is started on a secondary server. The amount of downtime depends on how busy the workload is during the move operation.
Workload or application downtime is increasingly considered to be unacceptable. To address such down time, virtualization techniques can be implemented with capabilities that enable moving of workloads and/or virtual machines between servers without the workloads ever visibly or detectably going off-line. Workload migration can be performed by copying the memory footprint of the workload from one system to another system, while the workload continues to run. Multiple copy operations address memory that has changed while being copied. In addition, for a typically small amount of time the workload is stopped to make a final memory copy before the workload can be restarted on the secondary node.
For example, virtual machine migration operations can involve repeated copying of the memory footprint of the virtual machine from one system to another. Once the memory is copied, a check is made to determine whether any memory changed during the copy, and the changes are then copied. The process repeats until the amount of memory that becomes “dirty” during the copy is roughly equivalent to the amount of memory that was copied. At this point the workload or virtual machine is “frozen” (sometimes called quiesced or checkpointed) on the primary server, a final copy of dirty memory is done and then the workload is activated or restarted on the secondary server. As long as the time of the last copy is shorter than typical network timeouts, the stopped condition of the workload is not detectable by a user. The workload did stop, but was for a very short period of time. The stoppage time increases based on how busy the workload is at the time of the migration because memory would be changing more rapidly resulting in more “dirty” memory pages in shorter timeframes.