Many storage environments may implement functionality to improve storage efficiency. For example, a storage controller may host a storage virtual machine and a plurality of virtual machine backup files that have overlapping operating system data that is stored as redundant data blocks within a storage device. Storing the redundant data may waste significant amounts of storage resources. Accordingly, the storage controller may implement deduplication to reduce the amount of redundant data being stored within the storage device. For example, the storage controller may determine whether a data block is already stored within the storage device. If the data block is already stored within the storage device, then the storage controller may merely store a reference, in place of the data block, that points to a location within the storage device that already comprises the data block.
Deduplication techniques may operate upon data that is already stored within storage devices. Unfortunately, accessing storage devices during deduplication can result in write amplification and unnecessary input/output (I/O) costs. Write amplification has a negative impact on solid state storage (SSD) devices, such as flash storage or a hybrid storage aggregate SSD tier. Accordingly, there is a need to efficiently perform deduplication before write operations are performed upon storage devices and/or with minimal access to storage devices, which may be beneficial for batch replication, virtual machine migration, virtual desktop infrastructure patching, scenarios where the same data is copied multiple times in a short time, etc.