The present invention relates generally to data recovery in storage systems and, more particularly, to methods and apparatus for fast data recovery from storage device failure such as HDD (hard disk drive) failure. The invention demonstrates the agility of storage data recovery and ease of use of disk maintenance against disk failure.
Currently, RAID (Redundant Array of Independent Disks) architecture is generally used to protect data from disk failure. For example, RAID5 and RAID6 each make it possible to recover from one disk failure of the RAID Group. RAID5 and RAID6 are each more efficient for capacity than RAID1 or RAID10. When a disk failure occurs, the storage system recovers data to a reserved “spare disk.” It needs to access the entire area of healthy disks to recover data. The time to data recovery depends on disk capacity and disk throughput performance. Generally, the technology growth ratio of capacity is larger than that of throughput. As a result, the RAID approach is slow to rebuild from disk failure and will be slower each year. Long time data rebuild has the possibility of causing long time performance decrement by corrosion between rebuilt disk I/O and normal disk I/O. Long time data rebuild also has the possibility of encountering the next disk failure during data recovery.
Under another approach based on RAIN (Redundant Array of Independent Nodes), the storage system includes a plurality of nodes (disks, storage subsystems, and so on). The storage system stores data to suitably-chosen two or more nodes. When node failure occurs, the storage system copies the data to another node(s) from redundant data. It can be conducive to better rebuild performance by a pillared process. Because the RAID approach needs to reserve one or more spare disk, the rebuild time under the RAIN approach will be faster than that under the RAID approach. The RAIN approach does not need reserved spare disk because it automatically stores redundant data to free space (self-recovery). On the other hand, the capacity efficiency under the RAIN approach is lower than that under the RAID approach.