In the computer science field, a “snapshot” is a term that is used to refer to the state of a set of data at a particular point in time. There are many reasons to implement snapshots of different data. For example, it is often desirable to implement snapshots of an entire disk drive. The snapshot of the disk drive may be useful, for example, to permit distribution and/or backup of a consistent disk image across many devices in a given system or computing environment.
Many types of data and storage management systems can be used to implement snapshots. Different types of volume managers and file systems can be used to implement and maintain snapshots. Such storage management systems that may be used to implement snapshots include, for example, logical volume manager in UNIX-based systems.
One type of storage management system that may implement snapshots is a virtualized storage management system. In a virtualized storage management system, a number of virtual disks (“vDisks”) may be structured from physical storage devices and exposed to virtual machines running within the system. Each vDisk may be broken up into equal sized units called vDisk blocks.
In maintaining vDisks for the virtualized storage management system, snapshots of a given vDisk may be periodically taken. Whenever a snapshot is taken for a vDisk, a number of steps occur atomically. These steps include: 1) providing the snapshot of the vDisk a name and a version number, 2) marking the snapshot immutable, and 3) making the live vDisk a child of the snapshot.
Metadata for each vDisk is maintained in order to allow for the physical data associated with the vDisk to be located. Such metadata is maintained in a mapping structure known as a vDisk Block Map. The vDisk Block map includes metadata for each block of a given vDisk, and metadata is kept for each snapshot of a given vDisk. For a given snapshot of a vDisk, only metadata related to blocks of the vDisk that have been modified (e.g., by a write operation) since the preceding snapshot (e.g., parent snapshot) of the vDisk are maintained in the vDisk Block map for that snapshot. Similarly, for the live vDisk, only metadata related to blocks of the vDisk that have been modified since the latest snapshot are maintained in the vDisk Block map. Said otherwise, if a vDisk block for a given snapshot hasn't changed since the preceding snapshot was taken, then no metadata for that vDisk block of the given snapshot is maintained.
The absence of metadata for a vDisk block of a given snapshot implies that a corresponding parent snapshot must be traversed in order to obtain that metadata for the vDisk block. As more and more snapshots of a vDisk are taken, and the snapshot chain/tree grows deeper, the ability to efficiently perform read operations on the vDisk using the vDisk Block Map substantially declines. For example, obtaining metadata for a given vDisk block to fulfill a read request may require traversing several levels of the vDisk Block Map.
One approach for mitigating the read performance of vDisks, involves performing an operation which copies over metadata from parent snapshots to child snapshots or from parent snapshots to the live vDisk, such that all metadata for blocks of a vDisk may be available for a given snapshot or for the live vDisk. However, by making copies of metadata causes unnecessary metadata bloating due to the duplication of redundant information. Such bloating wastes physical storage space (e.g., SSD space) and also reduces cache capacity for effectively holding the metadata.
Therefore, there is a need for an efficient approach for maintaining metadata for snapshots.