1. Field of the Invention
This invention relates to storage systems employing redundant arrays of inexpensive disks (RAID) techniques and, more particularly, to the dynamic configuration of RAID sets within the storage systems.
2. Description of the Related Art
Many computer systems employ mass storage systems that may include multiple storage subsystems. These mass storage systems may use arrays of storage devices, such as hard disks, for example. The use of storage arrays grew, in part, from the need to increase storage system performance. In such systems, storage performance may be increased by performing multiple I/O operations in parallel over an array of disks. By reading and writing multiple disks simultaneously, the storage system performance may be greatly improved.
However, using arrays of multiple disks comes with the disadvantage of increasing failure rates. For example, in a small array containing four disks, the mean time between failure (MTBF) for the array will generally be one-fourth that of a single disk. It is not uncommon for storage device arrays to include many more than four disks, shortening the mean time between failure from years to months or even weeks. Modern systems address this reliability issue by employing fault protection, or “redundancy”, so that data lost from a device failure may be recovered.
One common approach to providing fault protection in arrays of disks is to use one or more of the array architectures defined in redundancy schemes known as Redundant Arrays of Inexpensive Disks (RAID). A fundamental approach to RAID schemes includes the “striping” or partitioning of each drive in the array into multiple sections which are referred to as stripe units. Depending on the RAID implementation, a stripe may be as small as a single sector or as large as multiple blocks. Common RAID schemes include RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5 and RAID 6. In addition there are several variants or hybrid RAID schemes such as RAID 0+1, RAID 10, and RAID 53. Each of the RAID schemes has advantages and disadvantages. Some of the schemes may target increased performance, while others target increased availability.
In a typical data center, a given storage sub-system array or group of arrays may have common components in which a failure of such a component may take down the entire system or sub-system. Such failures are commonly referred to as Single-Points-of-Failure (SPoF). Depending on specific storage implementations, an array or arrays that have common components may define a failure group. More generally, any group of components or storage devices that share a common SpoF may be referred to as a failure group. These SpoF's often exist in storage subsystems, rack-mounted arrays comprising multiple disc drives, control logic, power systems and signal and power distribution schemes. One such SPoF is often a system component referred to as the midplane, which affects the interconnection of signals and power between the various components in a storage sub-system. An SPoF may negate any precautions that system administrators may use such as traditional RAID striping across drives in an array. For example, if a midplane fails, attached hosts are unable to access any drive in an array, regardless of any local striping.
One common technique used to mitigate an array SPoF is vertical striping. One conventional vertical striping technique uses a RAID 5 stripe that is composed of drives that physically reside in distinct arrays, such that two drives in a stripe do not reside in the same array. Since conventional vertical striping using a RAID 5 scheme may only be successful if each drive resides in a distinct array, this scheme is difficult if not impossible to use efficiently in small systems. In addition, conventional vertical striping is sometimes limited in its flexibility.
The RAID 5 scheme is also known more generally as an N+K scheme, where N drives are augmented with a K=1 extra drive to create a single parity group. Such a single parity group is tolerant of single drive failures, but intolerant of multiple drive failures. Several schemes have been proposed that define other RAID protection schemes involving K drives worth of redundant information, where K≧2 extra drives are used in conjunction with logic and control to statically protect a data store from up to K drive failures.
However, modern storage systems are typically a dynamic environment in which configurations may change rapidly due the dynamic and high-growth nature of application and storage volume administration, changing global sparing schemes, the desire to be able to expand storage in a storage subsystem without complete volume reconfiguration (i.e., online expansion and concatenation while maintaining application availability), the desire to maximize efficient use of storage and minimize storage costs, and so on. Thus, it may be desirable to have flexibility in storage system configuration, where system administrators may wish to dynamically configure a volume with a certain capacity without particular attention to where the physical drives reside in the storage system while still maintaining a given level of protection and/or availability.