The advantages of virtual machine (VM) technology have become widely recognized. Among these advantages is the ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware, while still ensuring that each user enjoys the features of a “complete,” isolated computer.
The advantages of various types of checkpointing are also widely recognized, such as providing a backup of some aspect of a computer system and providing the ability to revert back to a previously generated checkpoint to undo changes to some aspect of a computer system or to recover from a failure affecting the computer system. One particular use of checkpointing that is advantageous is to capture the state of a long-running computation, so that, if the computation fails at some point, it can be resumed from the checkpointed state, instead of having to restart the computation from the beginning.
Fast and frequent checkpointing of virtual machines is a useful technology for a number of applications: (1) continuous checkpointing allows users to revert back their application to almost any previous point in time; (2) reverse debugging based on deterministic replay also requires frequent checkpoints to reduce the amount of replay from a previous checkpoint that is required to execute backwards; (3) fast checkpoints can enable the possibility of speeding up an application by allowing speculative calculations that can be reverted if necessary; and (4) fast checkpoints provides a way of providing fault tolerance.
With respect to (4), fast and frequent checkpointing is especially attractive, since it can be used for symmetric multiprocessing (SMP) virtual machines. Deterministic replay is typically very hard to do efficiently for SMP VMs, so fault tolerance based on deterministic replay is typically only supported for single processor VMs.