Technical Field
This application relates to managing reclaiming storage space in file systems.
Description of Related Art
Computer systems may include different resources used by one or more host processors. Resources and host processors in a computer system may be interconnected by one or more communication connections. These resources may include, for example, data storage devices such as those included in the data storage systems manufactured by EMC Corporation. These data storage systems may be coupled to one or more servers or host processors and provide storage services to each host processor. Multiple data storage systems from one or more different vendors may be connected and may provide common data storage for one or more host processors in a computer system.
A host processor may perform a variety of data processing tasks and operations using the data storage system. For example, a host processor may perform basic system I/O operations in connection with data requests, such as data read and write operations.
Host processor systems may store and retrieve data using a storage device containing a plurality of host interface units, disk drives, and disk interface units. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels to the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical disk units. The logical disk units may or may not correspond to the actual disk drives. Allowing multiple host systems to access the single storage device unit allows the host systems to share data in the device. In order to facilitate sharing of the data on the device, additional software on the data storage systems may also be used.
Data storage systems, such as disk drives, disk storage arrays, network storage devices, storage area networks, and the like, are called upon to store and manage a significant amount of data (e.g., gigabytes, terabytes, petabytes, etc.) that is written and read by many users. For example, a traditional storage array may include a collection of hard disk drives operating together logically as a unified storage device. Storage arrays are typically used to provide storage space for a plurality of computer file systems, databases, applications, and the like. For this and other reasons, it is common for physical storage arrays to be logically partitioned into chunks of storage space, called logical units, or LUs. This allows a unified storage array to appear as a collection of separate file systems, network drives, and/or volumes.
Presently, there is a trend toward the use of larger operating systems, larger applications or programs, and larger file sizes. Understanding this trend, a storage administrator is likely to request the provisioning (i.e., allocation) of a larger portion of storage space than is currently required for an operating system, for example, with the expectation that the space requirements will grow with upgrades, bug-fixes, the inclusion of additional features, and the like. However, a problem of underuse typically arises when the amount of storage space is fully provisioned and allocated to, but is not used by, an operating system, program, process, or user. In this scenario, the disk storage space is unused by the entity that requested its allocation and is also rendered unavailable for use by any other entity. In such cases, the unused space cannot be simply given back to a common storage pool where it may become accessible to other users. For example, a database installation may require many terabytes of storage over the long term even though only a small fraction of that space may be needed when the database is first placed into operation. In short, the large storage space may eventually be needed, but it is not known exactly when the entire space will be required. In the meantime, the allocated storage space lies unused by the requested user and may not be utilized by any other user.
In recognition of the fact that more storage space may be provisioned for operating systems, programs, and users than can actually be used at first, the concept of a sparsely populated logical unit (LU), such as a mapped LUN (e.g., thin logical unit (TLU), direct logical unit (DLU)), was developed. Unlike the more traditional fully allocated logical unit, which is created by fully provisioning and an entire initial amount of storage area, a sparsely populated logical unit is provisioned at creation but is not allocated any physical storage until the storage is actually needed. Specifically, a TLU resolves this problem by allocating the storage space (e.g., making the memory space physically available) as it is needed when (or shortly before) data is written to the TLU. A TLU is created from a common pool of physical space and starts with minimal amount of physical space. As the application that is using the TLU starts to demand more storage, the TLU incrementally requests the storage space from the common storage pool in portions referred to as slices.
Generally, data storage systems typically arrange the data and metadata of file systems in blocks of storage. For example, the file data constituting files in a file system are stored in blocks of storage, as are inodes, indirect blocks, and other metadata. Data storage systems may provision storage to file systems in units of fixed size, here called “slices.” Data storage systems may generate slices, for example, from one or more physical storage devices, such as RAID groups of physical storage devices.
Some data storage systems provide thinly provisioned file systems that are organized based on sparsely populated logical unit such as mapped LUNs. Thinly provisioned file systems typically have very large address spaces but allocate specific storage slices to populate file systems only as storage is needed to satisfy write requests. A thinly provisioned file system may thus have an address space that is measured in petabytes but may allocate slices to occupy only a small fraction of the address space.
Data storage systems that provide thinly provisioned file systems may deallocate blocks of storage from the file systems when the blocks are no longer used, as part of file system shrink operations. In one kind of shrink operation, a data storage system identifies free blocks of storage in the slices supporting the file system. Any completely freed slices may be returned to a storage pool for later reuse.
Accordingly, there exists a need for systems, methods, and computer readable media for efficiently managing reclaiming storage in file systems.