The present invention relates to data storage, and more specifically, this invention relates to space reclamation in a data deduplication storage system.
Storage systems which store large amounts of data sparsely written within a virtual namespace can partition the namespace into regions, each region being managed as a non-overlapping portion of the namespace. As an example, a block storage system may provision many volumes, each volume having an address space of many gigabytes (GBs). Similarly, each volume may include a plurality of regions, and a region may span 1-100 megabytes (MBs) within the volume. Thus, each volume is partitioned into multiple regions, each managing data stored in their own namespace.
Furthermore, in a primary storage system which is dominated by complex read and write data accesses of relatively small size (e.g. 4 kB or 64 kB), performance is often a key requirement and therefore persistent metadata utilized to service data requests must be primarily referenced while in fast-access memory. In conventional storage systems, it is not always possible to keep all metadata needed to efficiently manage the entire namespace in fast-access memory, as the amount of metadata necessary for such management may exceed the available memory.
The amount of metadata necessary for efficient management of a namespace may also increase in systems employing data deduplication to maximize the amount of available storage in the system. Data deduplication generally involves the identification of duplicate (triplicate, etc.) data portions, e.g. on different volumes or regions within the namespace, and reduction of the amount of storage consumed by freeing the storage space associated with all but one (or a relatively small number in cases where redundancy is desirable) copy of the data. To maintain consistency and provide access to the data, references such as pointers, etc. may be implemented to direct access requests to the single retained copy.
While deduplication effectively increases available storage compared to retaining a plurality of redundant duplicates, the technique implements additional metadata to manage the references pointing from the duplicated location to the retained data location.
However, when a system reaches its maximum physical space allowance, data must be released from the system user space in order to release physical space and thereby allow new data to be written thereto. In a non-deduplicated storage environment, it is relatively straight forward to release data from the system user space by having a minimal amount of reserved storage space in order to use when performing the data deletions, and subsequently perform a garbage collection process. However, this solution is inapplicable in systems which implement data deduplication, as performing a space deletion operation in a deduplicated storage environment provides no guarantee of actually freeing existing allocated physical space, even after garbage collection has subsequently been performed. Rather, space deletion processes implemented in conventional deduplicated storage environments consume even more space storing metadata corresponding to the deletion process, thereby exacerbating the storage shortage.
Accordingly, efficiently managing storage space in deduplicated storage environments is of great significance. It would therefore be beneficial to provide techniques, systems, and corresponding computer program products for efficiently managing space reclamation in a data deduplication storage system.