In virtualized computer systems, it may be desirable in certain circumstances to suspend a virtual machine (VM) and resume it at a later time. For example, changes to the VM's configuration file cannot be made while the VM is executing. In order to make such changes, the VM is first suspended. This causes the VM execution to halt and the VM's state to be serialized and written to a file, commonly known as a checkpoint file or a suspend file. After the VM is suspended, the desired change can be made to the VM's configuration file. The VM having the changed configuration can then be resumed from the saved state.
A problem with the approach described above is that it may take a very long time to write out the VM's state to a file and then read it back again. For a VM with a memory size of many gigabytes, this can take tens of seconds to potentially minutes. Because of this prolonged read and write cycle, the VM would incur downtime that would cause the VM's network connection to expire and close. In addition, users and clients of the VM would not be able to access or interact with the VM during this downtime. This would be viewed as an outage period for the VM.
An alternative approach has been developed to reduce the amount of downtime. In this alternative approach, a new copy of the VM is started on the same host as the old VM while the old VM is still running. It then “pre-copies” the VM's memory from the old VM to the new VM. This is followed by the steps of suspending the old VM, making any desired changes to the VM's configuration file, and resuming from the new VM. Because of the “pre-copying” step, the amount of downtime the VM encounters can be reduced quite a bit. The downside of this approach is the doubling of the VM's memory usage and CPU costs associated with the “pre-copying” step.