1. Technical Field
This application relates to computing devices, and more particularly to the field of managing storage for computing devices.
2. Description of Related Art
Host processor systems may store and retrieve data using storage devices containing a plurality of host interface units (host adapters), disk drives, and disk interface units (disk adapters). Such storage devices are provided, for example, by EMC Corporation of Hopkinton, Mass. and disclosed in U.S. Pat. No. 5,206,939 to Yanai et al., U.S. Pat. No. 5,778,394 to Galtzur et al., U.S. Pat. No. 5,845,147 to Vishlitzky et al., and U.S. Pat. No. 5,857,208 to Ofek. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels of the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical volumes. The logical volumes may or may not correspond to the actual disk drives.
In some cases, it may be desirable to use erasure encoding to protect logical volumes in case one or more of the disk drives fails. Some types of erasure encoding, such as RAID encoding, provide for having multiple members on different physical devices. Depending on the type of RAID encoding, data may be protected from one or more physical drive faults. For example, RAID 1 provides for two members, where each member is a mirror of the other. If the members are located on different physical devices, then, when one of the physical devices fails, the other may be used for accessing the data. In addition, the RAID 1 configuration may be reconstructed using the remaining, non-failing device and a new device to which the data may be copied. After all of the data has been copied to the new device, the data is once again protected by the RAID 1 configuration.
Physical devices of a storage system may be subdivided into multiple sections where each section is used for storing a member of a RAID group. For example, a first physical device may be subdivided into three sections, the first section containing a RAID 1 member of RAID group A, the second section containing a RAID 1 member of RAID group B, and the third section containing a RAID 1 member of RAID group C. A second physical device may be similarly subdivided to provide a corresponding other member of each of the RAID 1 groups. Note, however, that if one of the physical devices fails, the three RAID groups will need to be reconstructed by accessing the remaining, non-failed, physical device, which may significantly slow down the reconstruction process. To address this, RAID groups may be distributed among a plurality of physical devices in a way that minimizes the number of occurrences of members of different RAID groups on the same physical device. For example, members of the RAID groups may be distributed so that only one physical device, at most, contains both a member of the RAID group A and a member of the RAID group B. Note also that, to facilitate protection from device failure, no more than one member of a particular RAID group may be provided on the same physical device.
In addition to considerations for distributing RAID group members among different sections of physical devices, it is also necessary to consider the number and placement of spare sections that may be used to reconstruct RAID group(s) following failure of a physical device. If a physical device having Q RAID group members fails, it is desirable to have available at least Q spare sections for reconstructing the RAID groups to restore RAID protection for the groups. One way to do this is to provide an extra physical device having only spare sections that may be used for reconstructing RAID groups when another physical device fails. However, this provides that all of the RAID groups affected by the failure would be reconstructed using the extra physical device at the same time following the failure, which may be less than optimal. Accordingly, the spare sections may be distributed among the physical devices, which addresses the issue of reconstructing all of the RAID groups to the same physical device, but may add complexity in terms of determining the number of spare sections needed to provide coverage for all of the RAID groups. Note that simply providing Q spare sections may not be sufficient because of other constraints, such as not having more than one member from the same RAID group on the same physical device and possibly other constraints/criteria. Of course, additional criteria/constraints may be addressed by significantly overprovisioning spare sections, but this may not be an acceptable solution where it is desirable to have a maximum number of useable RAID groups while still maintaining complete spare coverage for the groups to protect against failure of any of the physical devices.
Accordingly, it is desirable to provide a mechanism for provisioning spare sections for RAID groups in a way that allows all of the RAID groups to be reconstructed according to criteria for RAID group member placement without overprovisioning spare sections.