Organizations increasingly rely on virtualization technologies to improve the flexibility, efficiency, and stability of their data centers. One aspect of virtualization involves provisioning virtual machines with data storage volumes. Since a family of virtual machines may have substantially overlapping data, a virtualization solution may provision the virtual machines with data volumes more efficiently through space-optimized snapshots. A space-optimized snapshot may store only a fraction of the data used by a virtual machine—the remainder of the data may reside in a parent volume or snapshot (e.g., a parent snapshot or volume may store data common to multiple child snapshots).
In some scenarios, an organization may wish to back up its virtual machines so that it may recover them in the event of disaster or corruption. In traditional solutions, a backup system may back up each region of data of each virtual machine. Regions of data actually stored in a snapshot corresponding to a virtual machine (or “valid” regions) may be read from the volume of the virtual machine directly. Regions of data residing in a parent volume or snapshot (or “invalid” regions) may be fetched from the parent volume or snapshot. Unfortunately, such invalid regions may be accessed and read from multiple times, since multiple child snapshots may reference the same invalid region. Such duplicative work may waste computing resources. Accordingly, the instant disclosure identifies a need for efficiently creating consolidated backups of snapshot hierarchies.