Technical Field
The present disclosure relates to storage systems and, more specifically, to optimized segment cleaning in one or more storage systems of a cluster.
Background Information
A storage system typically includes one or more storage devices, such as solid state drives (SSDs) embodied as flash storage devices of a storage array, into which information may be entered, and from which the information may be obtained, as desired. The storage system may implement a high-level module, such as a file system, to logically organize the information stored on the storage devices of the array as storage containers, such as files or logical units (LUNs). Each storage container may be implemented as a set of data structures, such as data blocks that store data for the storage containers and metadata blocks that describe the data of the storage containers. For example, the metadata may describe, e.g., identify, storage locations on the devices for the data. In addition, the metadata may contain copies of a reference to a storage location for the data (i.e., many-to-one), thereby requiring updates to each copy of the reference when the location of the data changes. This contributes significantly to write amplification as well as to system complexity (i.e., tracking the references to be updated).
Some types of SSDs, especially those with NAND flash components, may or may not include an internal controller (i.e., inaccessible to a user of the SSD) that moves valid data from old locations to new locations among those components at the granularity of a page (e.g., 8 Kbytes) and then only to previously-erased pages. Thereafter, the old locations where the pages were stored are freed, i.e., the pages are marked for deletion (or as invalid). Typically, the pages are erased exclusively in blocks of 32 or more pages (i.e., 256 KB or more). Moving of valid data from old to new locations. i.e., garbage collection, contributes to write amplification in the system. It is therefore desirable to move the valid data as least frequently as possible so as not to amplify the number of times data is written, i.e., to reduce write amplification.