A storage server is a computer that provides access to information that is stored on one or more storage devices connected to the storage server, such as disk drives (“disks”), flash memories, or storage arrays. The storage sever includes an operating system that may implement a file system to logically organize the information as a hierarchical structure of directories and files on a storage device (e.g., disk). Each file may be implemented as set of data structures, e.g., disk blocks, configured to store information, such as the actual data for the file.
A storage server may be further configured to operate according to a client/server model of information delivery to allow one or more clients access to data stored on the storage server. In this model, the client may comprise an application executing on a computer that “connects” to the storage server over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet.
In the operation of a storage array (array), it is fairly common that a disk in the array will fail. Data can be lost when one or more disks fail, making it impossible to recover the data from the disk. An array may therefore implement a Redundant Array of Inexpensive/Independent Disks (RAID) scheme where logically sequential data is divided into segments and stored across a set of disks in the array. The set of disks may be referred to as a “RAID group.” With certain RAID schemes, extra “redundancy” data may also be written to the array so failure of a disk will not result in loss of data. Each segment of data or extra data can be stored in a disk block, for example, with the disk blocks storing such data and related extra data collectively referred to as a “stripe”. The number of disks across which the stripe spans is further referred to as the “stripe width.”
Various RAID schemes are available which correspond to certain data protection levels, disk space usage, and storage performance. For example, RAID level 0 (RAID-0) distributes data across several disks without storing extra data. Without the availability of extra data, data would be lost if any one of the disks fails. However, increased storage performance may be achieved since multiple disks simultaneously participate in the reading and writing of data. In RAID-1, data is duplicated in two or more disks to protect against data loss, thus providing a higher level of protection than RAID-0. However, RAID-1 consumes significant amounts of additional disk space for storing such an extra copy of the entire data. Thus, trade-offs exist between protection level, disk space usage, and storage performance for various RAID schemes.
Certain RAID configurations, such as RAID 4 or RAID 5, implement a parity protection scheme to efficiently protect against data loss without duplicating data. In a parity protection scheme, a parity value constitutes the extra data and is computed across multiple data blocks (e.g. disk blocks storing data segments). For example, a parity value may be computed by an exclusive-OR (XOR) operation across data blocks of disks of the array and stored in another disk block, such as a parity block. The set of data blocks and related parity block constitute a stripe, and data on a failed disk may be reconstructed by computing an XOR of the data, for example, across the surviving disks in the stripe. In RAID 4, the parity values are stored on a separate parity disk of the array that does not contain data. In RAID 5, the parity values are typically distributed across all the disks of the array.
In other RAID schemes such as that of RAID DP, two dedicated disks serve as parity disks. A first parity disk stores parity values from data computed across a single row stripe, whereas a second parity disk stores parity values from data computed across staggered blocks (including a parity block from the first parity disk) in different row stripes (otherwise referred to as a diagonal stripe). Using this parity protection scheme, an array may recover from a two-disk failure by computing data across a row stripe to reconstruct data on the first failed disk, and computing data across a diagonal stripe to reconstruct data on the second failed disk.
Yet other RAID schemes are further possible where every predetermined (e.g. 8th) block of a particular data structure, such as a file, is a parity block. In these cases, the availability of the parity block protects against loss of the file constituting the data and parity blocks. Here, if a disk storing one of the data blocks of a file fails, the file is still accessible by computing the lost data from the predetermined parity block.
When a disk failure is detected by a storage server, the storage server may immediately switch the array to a degraded mode of operation. In degraded mode, data remains available (including the data of the failed disk) and data services can still be maintained; however, storage performance is greatly reduced since constant calculation is required to derive the data of the failed disk from the surviving disks. To restore the array to a normal operating state, data is reconstructed (e.g. using parity values) and stored to a replacement disk in the array. Whether servicing client requests or supplying data in reconstruction, the surviving disks are limited in performance to the input/output (I/O) bandwidth of each respective disk. Furthermore, some disks may perform more I/O tasks than other disks depending on the distribution of data across the disks.
To improve storage performance during failure recovery and reduce the time the array spends in degraded mode, a RAID group may be configured across a set of “logical drives” and implemented with a greater number of physical drives (e.g. disks). During configuration, storage spaces on each of the logical drives are divided into data units formed by a contiguous set of data blocks, for example a disk “chunk.” A RAID group is then created by selecting chunks across a set of logical drives, and grouping the selected chunks as a “parity group”. An array can be configured with multiple parity groups, each of which contain a number of chunks allocated to a number of logical drives, and further configured on disks in the array. The array can then be presented as a single storage drive to external systems, and each of the parity groups can be seen as a contiguous storage unit. Since extra disks can be used to offload some of the I/O traffic from disks participating in the reconstruction of a parity group, the read and write bandwidth bottlenecks commonly associated with traditional RAID implementations may be reduced.
Parity declustering may also be implemented in the array to further improve degraded mode performance and improve recovery times. With parity declustering, parity groups are distributed across disks to produce a balanced I/O load on surviving disks. However, several challenges exist with conventional techniques for balancing I/O load across disks during reconstruction. In particular, conventional techniques for generating a declustered layout use a static approach which enforces a restriction of the same stripe width and RAID scheme on parity groups in the array to ensure a balanced distribution. Declustering parity groups with different RAID schemes or different stripe widths to facilitate particular storage requirements is not viable.
Difficulty in maintaining a balanced reconstruction load using the traditional technique is further evident when an array is modified. Such modifications may include adding a disk to the array, logically partitioning disk space into various sized “containers” constituting parity groups, resizing containers, manually rebalancing storage resources to service more frequently accessed data (“hot data”), etc. In these instances, the uniform characteristics of the parity groups are affected thereby changing the distribution of I/O traffic, including reconstruction load, offloaded to the surviving disks.