In virtual machine environments, a hypervisor running on a host hardware system creates a virtual system on which a guest operating system may execute. The virtual system includes a virtual storage volume on which the guest operating system stores its data. For example, the hypervisor may simulate a hard disk for the guest operating system that the hypervisor stores as a virtual disk file on the host system. Some hypervisors continually track and record changes to the virtual disk file in a changed block list.
A virtual storage volume within a virtual machine contains data items that need to be accessed. Unfortunately, accessing the underlying contents of a storage volume can be very resource-intensive, reducing the performance of a virtual machine and other operations within a virtual machine environment. A backup process in particular can typically be a very resource-intensive process.
A full backup of a large data set may take a long time to complete. On multi-tasking or multi-user systems, it may be desirable for data to be accessible and even written to while the particular storage volume is being backed up. Such a problem presents difficulties in maintaining data integrity and may introduce a version skew that could result in data corruption. For example, if a user moves a file into a directory that had already been backed up, then that file would be completely missing on the backup media, since the backup operation had already taken place before the addition of the file. Version skew may also cause corruption with files which change their size or contents underfoot while being read.
To avoid these issues, some systems may instead perform backups by taking snapshots, which are typically read-only copies of a data set frozen at a point in time. Advantageously, taking snapshots allows applications to continue writing to the underlying data and conserves system resources. For example, in some systems, once an initial snapshot of a data set is taken, subsequent snapshots may copy the changed data only, which consumes less disk capacity than if the full data set were repeatedly cloned.