Data deduplication is a well-known technique for reducing storage needs by replacing redundant instances of data units with some form of pointer to a single, or at most fewer, stored instances. As is usually the case, the benefit of greater storage efficiency through reduced redundancy comes at a cost, in the form of a decision: the smaller the granularity of the data units, the less overhead is incurred when a particular data unit is changed, but the more pointers must be stored and made accessible unambiguously (avoiding hash collisions, for example).
Although the need for efficient deduplication arises in many contexts, data redundancy is particularly common in virtualized computer systems, especially where several virtual machines (VMs) with similar configurations run on a common host and may even be cloned. Virtual machine file systems tend to use block sizes of 4K or 8K, and almost all writes are multiples of that block size. Such writes to random locations within the VM's virtual disk (vDisk) are a challenge to whatever system is used to manage them efficiently. This is particularly true in the case of a distributed storage system, in which there will typically be many physically separate storage nodes, some of which may have different numbers and types of storage devices, and some or all of which may be remote. The most straightforward approach is to manage and address each block individually. In a fingerprint-based deduplication (“dedupe”) storage system, the references to data are fingerprints, for example, hash values, and an index maps from each data fingerprint to its corresponding storage location so that the correct data can be found to respond to a read request. If the fingerprint (FP) index maintains an entry for every block in the system, the index can be quite large. If, for example, the FP is a 20-byte SHA1 hash and stored data is compressed 2:1, the index could easily be more than 10 GB per TB of storage capacity. With disk drive capacity already around 8 TB, the index can require a lot of expensive RAM.
Previous systems, for example, those provided by Data Domain, addressed this problem with specialized data and index layouts, but these technologies work well only for their intended use case of streaming sequential backup data. Primary storage workloads often require random accesses that do not work well with those techniques. What is needed is an approach that allows the index to reside in RAM, but does not require too much expensive RAM. It is of course possible to focus on optimizing the footprint of the index itself, but this still fails to significantly reduce the number of entries that need to be indexed.
A straightforward approach to reducing the number of fingerprint entries would be to index larger blocks, which are sometimes called “extents” because they include a range of logically contiguous blocks. For example, the first 64 KB extent would include the first sixteen 4 KB blocks, the second extent, the next 16 blocks, and so forth. Larger extents mean fewer extents and therefore fewer index entries and a smaller index. A downside of extents, for example, of 64 KB, is that a 4 KB write of a single VM file system block would cause a read-modify-write of the larger 64 KB extent. Such larger writes could have a significant performance impact. Further, in a system that maintains snapshots, both the old and new 64 KB would need to be retained even though they differed in only 4 KB out of the 64 KB. Thus, the space efficiency of such snapshots could be very poor, making it more expensive to retain large numbers of snapshots. Further, such extents do not, in general, correspond to semantically related data such as a file and instead may include unrelated and arbitrary sets of blocks. These arbitrary combinations of blocks in an extent are unlikely to be repeated in exactly the same way. This reduces the effectiveness of deduplication, which identifies exact copies of the same data stored multiple times. When extents are not fully identical, deduplication cannot optimize the data stored.