Data deduplication is a technique for reducing storage consumption in a storage system by eliminating redundant data. For example, if the storage system contains three storage objects O1, O2, and O3 that each include an identical unit of data D, data deduplication enables only one instance of D to be retained on the physical storage device(s) of the system. In this example, each storage object O1, O2, and O3 is configured to point to the single instance of D (rather than including a redundant copy of the data), thereby reducing the storage footprint of the objects.
While data duplication has clear benefits in terms of optimizing storage space usage, it is generally difficult to implement this feature in conjunction with both thick and thin provisioning of storage objects. This difficulty arises out of the fact that, for a thickly-provisioned (i.e., thick) storage object, a storage system must ensure that write requests to any portion of the object can be completed successfully. However, with existing data deduplication techniques, this property is not guaranteed. For instance, in a storage system that implements conventional data deduplication, any write request directed to a previously redundant portion of a storage object can result in the creation of new, unique data that requires the allocation of additional physical storage space (since the storage object can no longer point to a deduplicated copy of that data). If the storage system is already at capacity the write request will fail, which is not an acceptable behavior if the storage object is thickly-provisioned.