Most commercial hypervisors have a feature that, when invoked, preserves the current state of a virtual machine (VM) in file(s) on persistent storage. Each instance of preserved VM state is referred to as a “snapshot.” The preserved state can include central processing unit (CPU) register state, random access memory (RAM) state, and virtual disk state. These same hypervisors also have a feature that, when invoked, restores the state of a VM from a snapshot.
Some virtualized computing systems include applications executing in multiple VMs that cooperate to process transactions. The transactions flow from application to application and are tracked on virtual disks attached to the VMs. An administrator can generate snapshots of these VMs for the purpose of backup, cloning, development and testing, and the like. The hypervisors do not provide a method of snapshotting multiple VMs at the same time, nor can they due to limits of multi-tasking and non-deterministic schedulers. However, since the applications are connected and cooperatively process transactions, the resulting snapshots can create an inconsistent state among the applications. The challenge is that the snapshots are generated seconds or minutes apart, which can result in duplicate or missing transactions if the VMs revert to the individual snapshots. This requires the administrator to manually reconcile duplicate and/or missing transactions in the event of reversion to the snapshots, which is expensive and error prone. Alternatively, the administrator can shut down the VMs while generating the snapshots. However, shutting down VMs is not feasible or desirable in many production environments and can impact revenue generation.