Technical Field
The present disclosure relates to storage systems and, more specifically, to RAID rebuild of storage devices of a storage system.
Background Information
A storage system typically includes one or more storage devices, such as solid state drives (SSDs) embodied as flash storage devices, into which information may be entered, and from which the information may be obtained, as desired. The storage system may implement a high-level module, such as a file system, to logically organize the information stored on the devices as storage containers, such as files or logical units (LUNs). Each storage container may be implemented as a set of data structures, such as data blocks that store data for the storage containers and metadata blocks that describe the data of the storage containers. For example, the metadata may describe, e.g., identify, storage locations on the devices for the data.
Some types of SSDs, especially those with NAND flash components, may include translation logic of an internal controller, i.e., a flash translation layer, which maintains reserve capacity (i.e., previously-erased or free blocks) in the flash components to implement log-structured layout within the devices. As used herein, log structured layout denotes sequential storage of data on SSD. The reserve capacity may include free blocks that were previously erased in accordance with a process referred to as garbage collection. The SSD controller implements garbage collection by moving valid data from old locations to new locations among those components at the granularity of a page (e.g., 8 Kbytes) and then only to previously-erased pages. Thereafter, the old locations where the pages were stored are freed, i.e., the pages are marked for deletion (or as invalid). Typically, the pages are erased exclusively in groups of blocks (e.g., 32 or more pages totaling 256 KB or more). Such garbage collection typically results in substantial write amplification in the system.
In addition, the “on-disk” layout of the data structures in the storage containers (i.e., on the SSDs) may create a plurality of odd-shaped random “hole” (i.e., deleted data) fragments adjacent to data. This fragmented data (i.e., data with interposed holes) may not facilitate natural alignment boundaries for Redundant Array of Independent Disk (RAID) configurations, thus raising problems for RAID implementations. For example, if an attempt is made to write data into the odd-shaped fragments, it may be difficult to achieve good RAID stripe efficiency because partial stripes may be written, causing increased write amplification due to increased parity overhead.
Yet another source of write amplification in the system may involve RAID-related operations. Assume a dual parity RAID configuration that may include a plurality of data SSDs and two parity SSDs. A random write operation that stores write data on a data SSD of a RAID stripe may result in a plurality of read-modify-write (RMW) operations that, e.g., updates the data SSD with write data and updates the two parity SSDs with parity information after reading a portion of the write data and/or parity information. Such RAID-related operations results in a substantial amount of write amplification to the system. In addition, a substantial read load may occur on the parity SSDs as data from random read operations are verified using the parity information stored on the parity SSDs (i.e., requiring access to parity information on the parity SSDs).
To reduce read load resulting from a RAID configuration, parity information may be distributed among the storage devices (e.g., RAID 5 “rotating parity”). However, this may result in further undesirable read and write amplification when adding or removing storage devices because the data and/or parity information are redistributed (i.e., moved) among a different group of storage devices when the RAID configuration changes (i.e., change to the distributed parity RAID configuration), so as to maintain a reduced read load. In addition, increased storage capacity from additional SSDs (i.e., adding storage) is usually not available for end-user access until redistribution of data and/or parity information is complete for the changed RAID configuration.
Therefore, it is desirable to provide a file system that reduces various sources of is write amplification from a storage system while also diminishing read load on the storage devices of the storage system and supporting immediate end-user storage capacity changes when adding or removing storage devices, wherein the sources of write amplification and read load include 1) internal SSD garbage collection; 2) partial RAID stripe operations from fragmented data; 3) RMW operations from RAID organizations of data and parity; and 4) re-organization of RAID data and/or parity when storage devices change (i.e., added or removed storage devices).