In a virtualized computing environment, virtual disks of virtual machines (VMs) running in a host computer system (“host”) are typically represented as files in the host's file system. To back up the VM data and to support linked VM clones, snapshots of the virtual disks are taken to preserve the VM data at a specific point in time. Frequent backup of VM data increases the reliability of the VMs. The cost of frequent backup, i.e., taking frequent snapshots, is high because of the increase in associated storage costs and adverse impact on performance, in particular read performance because each read will have to potentially traverse each snapshot level to find the location of the read data.
Solutions have been developed to reduce the amount of storage consumed by snapshots. For example, snapshots can be backed up incrementally by comparing blocks from one version to another and only the blocks that have changed from the previous version are saved. Deduplication has also been used to identify content duplicates among snapshots to remove redundant storage content.
Although these solutions have reduced the storage requirements of snapshots, further enhancements are needed for effective deployment in cloud computing environments where the number of VMs and snapshots that are managed is quite large, often several orders of magnitude times greater than deployment in conventional data centers. In addition, storage technology has advanced to provide a multitude of persistent storage back-ends, but snapshot technology has yet to fully exploit the benefits that are provided by the different persistent storage back-ends.