1. Field of the Invention
This invention relates to computer systems and, more particularly, to servicing asynchronous write requests and repairing a failed component within data storage subsystems without interruption of service.
2. Description of the Related Art
Computer systems frequently include data storage subsystems for storing data. In particular, computer systems that include multiple clients interconnected by a network increasingly share one or more data storage subsystems via a network. The data storage subsystems may include or be further coupled to storage consisting of one or more disk storage devices, tape drives, or other storage media. A computer system may also include one or more servers in which metadata describing the contents of the included storage devices is maintained.
Data storage subsystems may store data with some redundancy to allow for recovery from storage errors. There are a variety of techniques to store data redundantly, including erasure coding techniques such as Reed-Solomon encodings and RAID (Redundant Array of Independent Disks) using a variety of layouts, such as RAID-1, RAID-5, or RAID-6. These RAID layouts may be implemented within an object-based file system in which each independent storage device is treated as a disk. Each client device may convey data to the storage devices via a network.
Unfortunately, some way of arbitrating write access requests from multiple clients may be needed to avoid introducing inconsistencies into the redundant data. One approach may include performing all of the functions involved in sequencing writes using a lock mechanism. For example, in the case of RAID-5 or RAID-6, these functions may include reading old data and old parity, computing new parity, logging the new data and new parity, and writing the new data and new parity to their respective storage locations that together constitute a part of or the whole of a row in the RAID layout. In addition, information may be retrieved from a Meta Data Server (MDS) for each write to an individual location in the RAID layout. The performance of these functions increases write latency and adds complexity and significant computational and storage overhead to each client.
Taking RAID-5 layout for example, user data may be divided into fixed size units called stripe units. Each stripe may be stored on a separate disk in which all disks may be physically co-located. The number of such devices may be configurable and once chosen may remain fixed. Each disk may generally be referred to as columns. Data may then be striped in rows across these columns. In each row, one column may hold a binary sum, called parity, of the remaining columns. The column holding the parity may rotate with each successive row. It is customary to speak of a RAID-5 layout as RAID n+1, since data is in n columns and parity is in 1 column. If any device fails, lost data may be reconstructed by summing the remaining columns—such as with a binary exclusive-or function. For data writes of size less than n, called a partial stripe, parity can be computed using a technique referred to as read-modify-write. In this manner, all columns are read, data is overlayed on top of the read data, and parity is computed. One problem with this approach is the high use of input/output (I/O) bandwidth for reading all columns, even if only one byte is written. Also, latency is increased due to extra read operations when only a write operation is performed on a subset of the columns, perhaps only one. Therefore, I/O performance suffers.
In view of the above, systems and methods for supporting asynchronous write operations within data storage systems and repairing a failed component within data storage subsystems without interruption of service are desired.