A network storage controller is a processing system that is used to store and retrieve data on behalf of one or more hosts on a network. A storage server is a storage controller that operates on behalf of one or more clients to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes. Some storage servers are designed to service file-level requests from clients, as is commonly the case with file servers used in a network attached storage (NAS) environment. Other storage servers are designed to service block-level requests from clients, as with storage servers used in a storage area network (SAN) environment or virtual tape (VTL) environment. Still other storage servers are capable of servicing both file-level requests and block-level requests, as is the case with certain storage servers made by NetApp, Inc. of Sunnyvale, Calif.
In a large-scale storage system, such as an enterprise storage network, it is common for certain items of data, such as certain data blocks, to be stored in multiple places in the storage system, sometimes as an incidental result of normal operation of the system and other times due to intentional copying of data. For example, duplication of data blocks may occur when two or more logical containers of data (e.g., files, virtual tapes, etc.) have some data in common or where a given set of data occurs at multiple places within a given logical container of data. Duplication can also occur if the storage system backs up data by creating and maintaining multiple persistent point-in-time images, or “snapshots,” of stored data over a period of time. Data duplication generally is not desirable, since the storage of the same data in multiple places consumes extra storage space, which is a limited resource.
Consequently, in many large-scale storage systems, storage controllers have the ability to “deduplicate” data, which is the ability to identify and remove duplication of data blocks. In one known approach to deduplication, any extra (duplicate) copies of a given data block are deleted (or, more precisely, marked as free), and any references (e.g., pointers) to those duplicate blocks are modified to refer to the one remaining instance of that data block. A result of this process is that a given data block may end up being shared by two or more logical containers of data.
A problem with at least one known deduplication technique is that it is difficult to predict the amount of storage space that would be recovered in the storage system through deletion of one or more logical containers of data. Since storage space of the storage system is a limited resource, a user of the storage system may wish to delete one or more logical containers of data to reclaim a desired amount of storage space in the storage system. However, deletion of a logical container of data does not necessarily ensure that the data block (or data blocks) referenced by the logical container of data gets deleted in a system that employs deduplication. Because a data block may be shared by two or more logical containers of data, a data block will not be deleted at least until all logical containers of data pointing to the data block have been deleted, in order for the system to function properly. Therefore, deletion of a logical container of data does not guarantee that a corresponding amount of storage space will be reclaimed, because some of the blocks of the deleted file may be shared by other logical containers of data. Consequently, in some instances, a user of the storage system may spend a substantial amount of time deleting a large quantity of logical containers of data in the storage system, only to discover at the end of the deletion process that little, if any, storage space is actually reclaimed in the storage system.