Various storage systems are available that use multiple storage devices to provide data storage with improved performance and reliability than an individual storage device. For example, a Redundant Array of Independent Disks (RAID) system includes multiple disks that store data. RAID systems and other storage systems using multiple storage devices are able to provide improved reliability by using parity data. Parity data allows a system to reconstruct lost data if one of the storage devices fails or is disconnected from the storage system. A variety of parity methods are available that permit the reconstruction of data from a failed storage device.
After the lost data is reconstructed, it is typically stored on one or more storage devices in the storage system. Different techniques can be used to store the reconstructed data in the storage system. One technique reserves one or more storage devices in the storage system for future use if one of the active storage devices fails. This technique is referred to herein as a “rebuild in place” technique. The reserved storage devices are commonly referred to as “hot spares”. The reserved storage devices remain idle and are not used for data storage unless one of the active storage devices fails. If an active storage device fails, the missing data from the failed device is reconstructed onto one of the reserved storage devices.
A disadvantage of the rebuild in place technique is that one or more storage devices are unused unless there is a failure of an active storage device. Thus, the overall performance of the storage device is reduced because available resources (the reserved storage devices) are not being utilized. Further, if one of the reserved storage devices fails, the failure may not be detected until one of the active storage devices fails and the reserved storage device is needed. Another problem with this technique occurs when all of the reserved storage devices have been used. If another failure occurs, data reconstruction is not possible because there are no unused storage devices available. Thus, the storage system remains in a degraded condition until an unused storage device is added to the storage system or a failed storage device is replaced by a system administrator.
Another technique for reconstructing lost data uses all storage devices to store data, but reserves space on each storage device in the event that a storage device fails. This technique is referred to herein as a “migrating rebuild” technique. Using this technique, the storage system typically realizes improved performance by utilizing all of the storage devices while maintaining space for the reconstruction of data if a storage device fails. In this type of storage system, data is typically striped across the storage devices. This data striping process spreads data over multiple storage devices to improve performance of the storage system. The data striping process is used in conjunction with other methods (e.g., involving the use of parity information) to provide fault tolerance and/or error checking. The parity data provides a logical connection that relates the data spread across the multiple storage devices.
A problem with the above technique arises from the logical manner in which data is striped across the storage devices. To reconstruct data from a failed storage device and store that data in the unused space on the remaining storage devices, the storage system relocates all of the data on all of the storage devices (i.e., not just the data from the failed storage device). Relocation of all data in a data stripe is time consuming and uses a significant amount of processing resources. Additionally, input/output requests by host equipment coupled to the storage system are typically delayed during this relocation of data, which is disruptive to the normal operation of the host equipment.
Accordingly, there exists a need for an improved system and method for data reconstruction in a storage system that uses multiple storage devices.