A storage system is a processing system adapted to store and retrieve data on behalf of one or more client processing systems (“clients”) in response to external input/output (I/O) requests received from clients. A storage system can provide clients with a file-level access to data stored in a set of mass storage devices, such as magnetic or optical storage disks or tapes. Alternatively, a storage system can provide clients with a block-level access to stored data, rather than file-level access or with both file-level access and block-level access.
Data storage space has one or more storage “volumes” comprising of a cluster of physical storage disks, defining an overall logical arrangement of storage space. The disks within a volume/file system are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability and integrity of data storage through the redundant writing of data stripes across a given number of physical disks in the RAID group. When data can be read from the disks and written to the disks, it is said that the storage system operates in a normal mode.
In a storage system, data gets lost or corrupted from time to time, for example, upon the failure of one of the mass storage devices. Consequently, virtually all modern storage systems implement techniques for protecting the stored data. Currently, these techniques involve calculating a data protection value (e.g., parity) and storing the parity in various locations. Parity may be computed as an exclusive-OR (XOR) of data blocks in a stripe spread across multiple disks in a disk array. In a single parity scheme, e.g. RAID-4 or RAID-5, an error can be corrected in any block in the stripe using a single parity block (also called “row parity”). In a dual parity scheme, e.g., RAID Double Parity (RAID-DP), a technique invented by Network Appliance Inc. of Sunnyvale, Calif., errors resulting from a two-disk failure can be corrected using two parity blocks. The first one is a row parity block, which is computed as a result of XOR of data blocks in a stripe. The second one is diagonal parity, which may be computed as XOR of data blocks and a parity block in a diagonal set.
Referring now to FIGS. 1A and 1B, they show arrangements of data blocks on storage devices using parity blocks. In FIGS. 1A and 1B, data sent to a storage system from a client(s) for storage as part of a write operation may first be divided up into fixed-size, e.g., four Kilo Byte, blocks (e.g. D0, D1, etc.), which are then formed into groups that are stored as physical data blocks in a “stripe” (e.g. Stripe I, Stripe II, etc.) spread across multiple devices (e.g., disks) in an array. Row parity, e.g. an exclusive-OR (XOR) of the data in the stripe, is computed and may be stored in a parity protection block on disk D. The row parity, e.g., P(0-2), may be used to reconstruct a single lost or corrupted data block in Stripe I. The location of the row parity depends on the type of protection scheme or protocol implemented. FIG. 1A shows a RAID-4 scheme in which the row parity, e.g. P(0-2), P(3-5), P(6-8), and P(9-11) are stored in disk D. FIG. 1B shows a RAID-5 scheme in which the row parity is distributed across disks in the array. For example, P(0-2) is stored on Disk D, P(3-5) is stored on Disk C, P(6-8) is stored on Disk B, and P(9-11) is stored on Disk A.
Occasionally, disks experience an operational problem that either degrades disks' read-write performance or causes a disk failure. When a disk failure is detected, the storage system operates in a degraded mode. While operating in a degraded mode, the storage system services external I/O requests received from the clients as follows: if the requested data are stored on a failed disk, the storage system reads data from other disks (including the parity disk) and performs an exclusive OR (XOR) operation on the data that is being read to recover data that was stored on the failed disk. The result of the XOR operation is provided to the client. To service a write request to a failed disk while operating in a degraded mode, the storage system reads data from all data disks, other than the failed disk, XOR's the received data with the data being written, and writes the result of the XOR operation to a parity disk. While operating in a degraded mode, the storage system is vulnerable to data loss from another failure. In RAID-4, for example, if there is a two-disk failure, data cannot be recovered.
Conventionally, spare disks are used to replace failed disks. A spare disk is a disk held in reserve to which data and parity are not written during normal I/O operations. Spare disks can be global spare disks that can be used by any RAID group. Alternatively, spare disks can be dedicated ones, i.e., they are part of a particular RAID group. To choose an appropriate spare disk, various factors are taken into account, such as type of the spare disk (e.g., FC, Advanced Technology Attachment (ATA), etc), disk's revolutions per minute (RPM), disk checksum type, and disk capacity. A person of ordinary skill in the art would understand that other parameters can be used by a storage system to pick a spare disk.
Once the storage system replaces the failed disk with the spare disk, it begins a process of reconstructing data from the failed disk. The reconstruction process can be performed intermittently with normal I/O operations. The process of data reconstruction involves reading data from all disks other than the failed disk, XOR'ing the data being read, and writing the result of the XOR operation to a spare disk, which becomes a reconstructing disk once it replaces the failed disk. Thus, to reconstruct data in a stripe, multiple read operations and at least one write operation have to be performed. Performing multiple read operations during the reconstruct process results in rotation latency, thereby increasing overall time required to perform a reconstruct process. In addition, performance of the storage system suffers during reconstruct time since regular I/O operations are performed at the same time. Furthermore, if during a reconstruct process a second disk fails, or has a media error, it is impossible to reconstruct data in RAID-4 or RAID-5. In a dual parity RAID array with two failed drives and a third drive failure or a media error on a third disk, some or all data would not be recoverable.
Accordingly, what is needed is a method and system that reduces the number of I/O operations required to reconstruct a failed disk to a spare disk, thereby reducing the time for reconstructing the failed disk.