Storage systems, such as a virtual storage area network (VSAN), provide servers with a centralized collection of storage devices. Features such as snapshots, redundant arrays, checksums, encryption, and data deduplication are desirable for enterprise storage systems. Such features, especially when combined, complicate the implementation of the storage systems.
For example, snapshots of a virtual disk or other portion of storage (e.g., a logical volume) may be created as (virtual) sparse disks. Sparse disks use a copy-on-write mechanism, in which the snapshot is an ordered set of blocks (or other logical address space) and contains no data in portions of a stripe until copied there by a write operation. In other words, the snapshot only contains data that differs from the previous snapshot and conserves resources by avoiding the copying of data that has not changed from the previous snapshot. The portion of snapshots that contain no data are referred to as “holes.” When reading data from a snapshot that includes a hole, data represented by the hole is read from a parent snapshot or the base disk. When snapshots are created as sparse disks, however, a data value (such as a “0”) may not be distinguished from a hole when calculating parity for, e.g., a redundant array of independent disks (RAID). As a result, snapshots with holes may not be rebuilt from parity and, therefore, are incompatible with RAID.
Additionally, it is desirable to detect data errors introduced by storage or transmission of the data. The use of a checksum algorithm is one means of detecting such errors. The implementation of end-to-end checksums on top of storage systems, however, can significantly slow the processing of input/output (I/O) operations.
Data deduplication and encryption are also desirable features in enterprise storage. Deduplicating encrypted data requires using the same key to encrypt the same data. This may be accomplished by generating data-specific encryption keys such that the same data (e.g., clear text) will be encrypted into the same encrypted data (e.g., cipher text). As a result, matching encrypted data may be deduplicated. The storage and management of data-specific encryption keys, however, complicates the implementation of the storage system. For example, an efficient method of storing and verifying the multitude of encryption keys is needed to ensure that data can be correctly decrypted and repaired when needed.