Backup systems are used to protect data against loss. Typically, a backup system includes software that copies the content of one or more disks, volumes, or files to a backup image stored on backup storage media housed in a backup storage device. If data is lost on the original disk, volume, or file, the backed-up content of the data can be retrieved from the backup storage device and restored. Once the backed-up content is restored, the data is available for use.
A virtual machine is a software implementation of a computing system that executes on a physical computing system referred to as a virtual machine hosting platform, frequently referred to simply as a “hosting server.” The virtual machine executes instructions as though it is a physical computing system. Resources of the hosting server are allocated to support the virtual machine. These allocated resources can include both “time shares” of a physical resource, such as a “cycle” of a processor and semi-permanent allocations, such as the allocation of space on a disk volume. For example, storage space can be allocated to a virtual machine in the form of a container file on a physical drive. These container files are typically referred to as virtual disks. A hosting server can allocate disk space on physical disks associated with the hosting server to multiple virtual machines. A virtual machine typically includes a configuration file and one or more virtual disks.
Backup of a virtual machine involves copying the configuration file and the content of the host container files representing the virtual disks to a backup image on a backup storage device. Virtual machines can pose difficulties in the performance of backup operations. Current backup solutions require that all storage allocated to a container file must be backed up, without regard to whether the allocated storage currently contains any usable data. Current backup solutions therefore result in needlessly large allocations of space in the backup image to preserve copies of virtual machine volume space that was unused or invalid (e.g., space containing unreferenced data from deleted files). The use of machine time, data transmission bandwidth and storage space to generate and preserve copies of virtual machine volume space that was unused or invalid creates costs in terms of resources better allocated to other operations. It is desirable that these costs be minimized.