A Redundant Array of Inexpensive Disks (RAID) array distributes data across several physical disks and uses parity bits to protect data from corruption. Conventionally, a RAID disk array uses a single parity disk to provide data protection against a single event, which can be either a complete failure of one of the constituent disks or a bit error during a read operation. In either event, data can be re-created using both the parity and the data remaining on unaffected disks in the array.
The development of disk drive technologies produces a new generation of disk drives that have large capacity, high mean-time-to-failure (MTTF) and high bit error rate. An example of the disk drives includes the Serial Advanced Technology Attachment (SATA) drives. The widespread acceptance of the SATA drives has resulted in the development and use of double-parity RAID (RAID-DP). RAID-DP adds a second parity disk to each RAID group in order to provide data protection against the failure of two disks in the same RAID group. The fault tolerance level (e.g., RAID-DP vs. RAID-4) required by data is often based on the criticality of data. For example, a system administrator may determine the fault tolerance level based on the mean-time-to-data-loss (MTTDL) requirement in the system specification.
In determining which fault tolerance level to use for stored data, a system administrator has to strike a delicate balance between fault tolerance characteristics and performance/capacity overhead for each RAID type. The RAID types include mirrored RAID type (e.g., RAID-41, RAID-51, RAID-DP1, RAID-01), unmirrored RAID types (e.g., RAID-4, RAID-5, RAID-DP, RAID-0), and other variants. Each RAID type protects data against a fixed amount of fault with a fixed number of parity bits. However, storage of parity bits incurs capacity overhead and update to the parity bits incurs performance overhead.
After a RAID type is determined for a storage system, characteristics of the data and the storage system may change over time. In one scenario, the data may become not so critical as to warrant a RAID type having a high level of fault tolerance. Since higher fault tolerance typically implies larger RAID groups, simply removing a parity disk is generally insufficient. Rather, the data-to-parity ratio needs to be rebalanced to ensure that the Mean Time to Data Loss (MTTDL) is within acceptable bounds. In another scenario, additional disks may be installed in the storage system to provide storage for more parity data. In yet another scenario, an increase in small-write operations may warrant a decrease in the number of disks in each RAID group. Small-write operations refer to writing an amount of data that is smaller than a full stripe across all disks in a RAID group. Instead of writing a full stripe of data and parity, a small write operation involves reading the parity, writing data and updating the parity, and, therefore, increases performance overhead. Decreasing the number of disks in each RAID group reduces the stripe size, which in turn reduces the occurrence of small-write operations.
When data or system characteristics change, current RAID groups cannot be easily reconfigured to adapt to the change. Reconfiguration operations often incur system downtime and have a negative impact on system performance.