Modern information systems are expected to store large amounts of information and protect the information from loss or corruption. Snapshots have become a preferred method of protecting a data storage volume against inadvertent data loss and for performing background backups. A read-only snapshot is a non-writable volume that is a point-in-time image of a data storage volume that can be created, mounted, deleted, and rolled back onto the data storage volume arbitrarily. Such snapshots are utilized extensively in the data storage industry for security, backup, and archival purposes. A writeable snapshot is initially an image of a read-only parent snapshot. The writeable snapshot may be written to and modified without affecting the read-only parent snapshot. The writeable snapshot may be said to branch off of the read-only parent since modifications to the writable snapshot can cause the writable snapshot to diverge from the content of the read-only parent snapshot.
Boot consolidation allows multiple machines to boot from a single server or a single networked storage device. Snapshots can be used for boot consolidation by providing a separate snapshot on a server for each machine to boot from. To accomplish this, a single operating system image may be installed and configured before multiple snapshots of that single installation are created to use for booting multiple client machines.
As the client machines use their snapshots, data is generally written to each of the client snapshots causing them to diverge from one another. However, in many instances a substantial portion of data may be duplicated across multiple client snapshots. For example, when a system upgrade or software installation is performed at multiple clients, the change to the client snapshots may be substantially the same. Having identical data duplicated across multiple client snapshots on a data storage system may be considered a waste of valuable storage space. Such duplication may also have a low caching efficiency since identical data reads are made from different storage locations and thus are not identified as cache hits.
It is with respect to these considerations and others that the disclosure made herein is presented.