A typical data storage system includes one or more arrays of magnetic disk drives or similar non-volatile storage devices, and a controller that controls the manner and locations in which data is stored in and retrieved from the devices. It is important that a host system be able to reliably access all of the data in the data storage system. However, a potential problem that affects data storage systems is that one or more of the devices can fail or malfunction in a manner that prevents the host system from accessing some or all of the data stored on that device.
A redundant array of inexpensive (or independent) disks (RAID) is a common type of data storage system that addresses the above-referenced reliability problem by enabling recovery from the failure of one or more storage devices. Various RAID schemes are known. The various RAID schemes are commonly referred to by a “level” number, such as “RAID-0,” “RAID-1,” “RAID-2,” etc. For example, as illustrated in FIG. 1, a conventional RAID-5 system 10 can have, for example, four storage devices 12, 14, 16 and 18 (e.g., arrays of disk drives) with, for example, three logical volumes A, B and C, where each logical volume is divided into three storage areas. In RAID-5 storage system 10, data is distributed across storage devices 12, 14, 16 and 18, with parity information for the data distributed among storage devices 12, 14, 16 and 18 as well. Distributing logically sequential data segments across multiple storage devices is known as striping. For example, in response to a request received from a host system 20 to write data to volume A, RAID-5 system 10 distributes logically sequential data segments 22, 24, 26, etc., across corresponding storage areas A1, A2 and A3 in storage devices 12, 14 and 16, respectively, then computes parity information for data segments 22, 24 and 26 and stores the resulting parity information 28 in another corresponding storage area in storage device 18. In performing striping, the granularity is typically relatively small, meaning that in the vast majority of instances in which a write operation is requested, the data to be written is divided into many data segments (of typically only a few kilobytes each) and written across many corresponding storage devices.
System 10 is depicted in FIG. 1 using a conventional RAID notation or symbology in which each storage area is represented by a disk-like symbol, but it is understood that the symbol represents a logical element and not a physical disk. Also, the corresponding storage areas across which logically sequential data segments are striped is commonly referred to as a “layer,” which is another logical construct. For purposes of clarity, each logical volume is shown in FIG. 1 as having only a single layer. Nevertheless, each logical volume commonly has many layers.
Note that each of the four storage devices 12, 14, 16 and 18 includes only a portion of three of the four logical volumes as well as parity information for the fourth logical volume. The parity information for volume A or AP is the exclusive-OR (XOR) of the data stored in storage areas A1, A2 and A3, i.e., AP=A1 XOR A2 XOR A3. The parity information for volume B or BP is the exclusive-OR of the data stored in storage areas B1, B2 and B3, i.e., Bp=B1 XOR B2 XOR B3. The parity information for volume C or CP is the exclusive-OR of the data stored in storage areas C1, C2 and C3, i.e., CP=C1 XOR C2 XOR C3. The parity information for volume D or DP is the exclusive-OR of the data stored in storage areas D1, D2 and D3, i.e., DP=D1 XOR D2 XOR D3. Thus, if any one of storage devices 12, 14, 16 and 18 fails, it is possible to reconstruct the data or parity information that was stored on the failed device from the corresponding data and parity information stored on the other three devices. For example, if storage device 14 fails, it is possible to reconstruct the data in storage area A2 by computing the exclusive-OR of A1, A3 and AP.
It should be noted that the storage areas in each of storage devices 12, 14, 16 and 18 are typically physically contiguous. That is, storage areas A1, B1, C1 and DP are physically contiguous on storage device 12; storage areas A2, B2, CP and D1 are physically contiguous on storage device 14; storage areas A3, BP, C2 and D2 are physically contiguous on storage device 16; and storage areas AP, B3, C3 and D3 are physically contiguous on storage device 18. Physically contiguous storage areas commonly comprise physically adjacent sectors on a disk.
Mirroring is another technique for addressing the above-referenced reliability problem. In mirroring, data is duplicated among more than one storage device. That is, the same data (and parity) are stored on two or more storage devices. In some instances, mirroring has been combined with RAID principles. For example, RAID-1 includes mirroring instead of striping. Hybrid RAID schemes known as RAID-0+1 and RAID-1+0 combine mirroring and striping techniques.
A disadvantage of the above-described RAID-5 scheme and similar schemes in which data is striped across multiple storage devices is that if one of the storage devices fails, the failure affects all of the volumes. Thus, in the event of failure of even one storage device, it is generally necessary to reconstruct at least some of the data before a read request from the host system can be properly performed.
Another disadvantage of the above-described RAID-5 scheme is that all of the storage devices of the data storage system must have the same capacity as each other, due to striping. Still another disadvantage of the above-described RAID-5 scheme is that, to increase storage capacity by adding new storage devices, data and parity information must be redistributed among the increased set of devices, since parity information is embedded within the otherwise physically contiguous data. Also, although RAID-5 is capable of recovering from failure of a single storage device, and RAID-6 is capable of recovering from failure of two storage devices, recovery from failure of more than two storage devices is generally not possible with conventional RAID schemes.
A redundant array of independent servers (RAIS), also known as a redundant array of independent nodes (RAIN), is another type of data storage system that addresses the reliability problem. In a RAIS system, each storage device resides on an independent server computer. The servers are interconnected to each other by a network. Striping and other RAID methodologies are typically employed in a RAIS system. A disadvantage of the RAIS scheme is that system performance is adversely affected during rebooting of any one of more of the servers, which can be a relatively frequent occurrence in a large RAIS system.