In modern data storage systems, the technique known as “RAID” (for “redundant array of inexpensive disks”) can be employed to provide high levels of reliability from groups of relatively low-cost and less reliable disk drives. There are a number of different types or “levels” of RAID, which vary in the degree of redundancy they provide as well as their complexity. With certain types of RAID, such as RAID-4 or RAID-DP for example, a “RAID group” includes multiple drives dedicated for storing data and one or more additional drives dedicated for storing parity information relating to the data on the data drives. Other forms of RAID, such as RAID-5, distribute the parity information across the data drives instead of using dedicated parity drives. In the event of a failure of a particular drive, the information on the remaining drives can be read and used to compute and reconstruct the data from the failed drive.
During RAID reconstruction, the data on the failed drive is typically constructed on a new replacement drive, or alternatively on a “hot spare” drive dedicated for use in RAID reconstruction. One common problem, however, is that RAID reconstruction can take many hours to complete, depending upon the size of the affected RAID group, and the ever-increasing size of hard drives has a proportional effect on the amount of time needed to complete a RAID reconstruction. It is desirable to complete a RAID reconstruction as quickly as possible, since during the reconstruction process the system has a lower resiliency to failure (called “degraded mode”). One of the factors that can lead to slow reconstruction is the limited rate at which data can be written to the reconstructing drive, which cannot be greater than the bandwidth of a single hard drive.
Two known techniques for addressing this problem are “distributed hot sparing” and “drive slicing”. Both of these techniques distribute the data and the hot spare space across multiple hard drives in some uniform manner. Distributed hot sparing involves pre-allocating one or more drives in a dedicated sparing relationship for a specific associated RAID group. In drive slicing, the data and hot spare space for multiple RAID groups are distributed across a single set of drives. In both of these techniques, however, one or more drives are pre-allocated to provide hot spare storage space. Such pre-allocation of drives is inflexible and often leads to a large amount of available storage space in the system going unused. Storage space is a valuable resource, and it is under desirable for it to be wasted. Furthermore, the characteristics of a given storage system may change over time (e.g., topology, drive types and capacity, etc.), such that a given allocation of hot spare space may become sub-optimal. The inflexibility of current techniques require the storage system to be taken off-line and physically and/or logically reconfigured in order to change its hot spare space allocation. In a large-scale storage system, it may be undesirable to take the system off-line for even a short time.