In conventional storage systems, RAID (redundant array of individual disks) based data protection can be provided by individual RAID groups. However, actual physical disks must be present and selected to act as a RAID group before the corresponding storage space can be made available for use. When a disk fails in a conventional RAID system, the failed disks must be quickly replaced, either by using a hot spare or by manually replacing the failed disk. Once the failed disk is swapped with a replacement disk, a period of high frequency I/O is directed to the replacement disk to reconstruct the data stored thereon in order to provide ongoing data protection. Namely, the RAID group enters a degraded state until the missing data from the failed disk is reconstructed on the replacement disk. During the updating period of the replacement disk, the RAID group is vulnerable to subsequent disk failures.
Another problem also exists in conventional storage systems which rely on RAID based data protection. In the event of power failure, partially written RAID stripes cannot be recovered. In general, to overcome this problem, uninterruptible power supplies have been provided or memory areas having battery backup protection have been provided.
In addition, conventional RAID based storage systems are inflexible since all disks in a RAID group are dedicated to a single level of protection regardless of the storage utilization, or lack thereof, in the disks. Formatting a RAID group is time consuming process that can further contribute to the delay of utilizing the storage space therein. While providing a hot spare can provide a ready replacement disk, such configurations require that one or more disks in the storage system remain idle and unused until a failure occurs. On the other hand, if no hot spare is provided, careful attention must be paid to the RAID group to ensure that when a failure does occur, that prompt replacement of the failed disk occurs.
When in the process of recovering a failed disk by restoring data to the replacement disk, all writes are directed to the replacement disk in order to restore the protection level of the particular RAID group. As explained above, during this time, the RAID group is susceptible to additional disk failures and the time until the protection level of the RAID group is restored is generally limited by the bandwidth of the head assembly which writes to the replacement disk.