1. Field
The disclosure relates to a method, system, and article of manufacture for the management of redundancy in data arrays.
2. Background
Certain information technology storage systems may provide high availability and reliability via implementation that provide redundancy. Fault tolerance may be achieved in such storage systems via redundant fault tolerant hardware designs in which user data may be stored in arrays of storage devices configured in a Redundant Arrays of Inexpensive Disks (RAID) scheme. Certain RAID schemes (e.g., RAID levels 1, 2, 3, 4, 5, 10 [0+1, 1+01]) provide a single level of redundant protection and are tolerant of a single device failure, wherein failures of additional devices may potentially cause data to be lost.
Online RAID array repair, also known as hot sparing, may be used to restore RAID array redundancy following a failure of a storage device. During the online array repair, the RAID array may be in a rebuilding state and may remain susceptible to additional failures that result in an unrecoverable data loss. The advancements in the pace of storage device capacity growth may have caused the amount of data at risk in any single storage array to reach levels where the statistical probability of data loss events may make it difficult to attain the desired fault tolerance in high availability storage systems.
Additionally, certain storage system solutions provide high capacity low cost storage devices [e.g., Serial ATA (SATA), Fibre Attached Technology Adapted (FATA)] which typically have lower reliability characteristics than server class devices [e.g., Fibre Channel-Arbitrated Loop (FC-AL), Small Computer Systems Interface (SCSI), Serial Attached SCSI (SAS), Serial Storage Architecture (SSA)] with larger capacity per device, and such high capacity low cost storage devices may further exacerbate the faults in single redundant RAID levels. “Advanced RAID levels” (e.g., RAID 51, RAID 6, RAID 3+3, RAID N+3) are designed to tolerate multiple storage device failures to restore a balance between fault tolerance and RAID system data protection.