Virtual machine hypervisors, or virtual machine monitors, are responsible for creating and running virtual machines on a host machine. The virtual machine hypervisor provides a simulated computing environment on the host machine, through which the virtual machine can interact with the host machine's resources, such as network access, peripheral device access, disk storage, computing resources, etc. Such resources often include a non-persistent memory (e.g., a random access memory) for temporarily storing data and a persistent memory (e.g., a disk drive) for providing non-volatile data storage.
When interacting with data, the system may read or write data from the non-persistent memory as well as the persistent memory. Furthermore, on a periodic basis, data within the non-persistent memory can be written to the persistent memory. For example, a virtual machine may generate three blocks of data within the non-persistent memory (e.g., data consisting of blocks A, B, and C). In order to write the data to the persistent memory, the blocks go through several layers of operations such as virtual machine operating system layers (e.g., file system, block device, hardware device layers), then host bus adapter layers (e.g., the hardware/firmware for connection to underlying resources), and finally to the actual hardware device (e.g., the persistent memory). Not only does each block go through a plurality of the layers described above, but file system(s) maintained by the virtual machine and/or hypervisor may generate various data, such as mode pointers and metadata, that describe the data stored in the persistent memory.
In the middle of writing data to persistent storage, for example in the middle of writing block B to persistent storage, a system crash may occur (e.g., system failure, loss of power, irrecoverable error, etc. requiring system restart). Depending on what metadata has been written to disk, where the data is within the various layers, what data has been written to disk, what data has not been written to disk, and so on, there is the potential for a great deal of inconsistency in the data stored within the persistent memory. That is, the metadata stored in the persistent memory will likely be different from the actual data stored in the persistent memory. Thus, in order to bring the system effected by the failure back online and correct any inconsistencies, a time consuming and computationally intensive recovery process including reconstructing the file system data structures must be run on all the data within the file system stored on the persistent memory.