The present invention relates to data storage systems, and more specifically, this invention relates to configuring a data storage system such that it tolerates an increased number of storage element failures
Maintaining data access is an important requirement in cloud systems, as is minimizing acquisition cost and ownership cost. To ensure data access, a cloud system may implement storage using a two-dimensional array in which every column is a set or JBOD (Just a Bunch of Disks) with a common failure mechanism. Further, the sets may be protected using a Redundant Array of Independent Disks (RAID) architecture, such as RAID-5 or RAID-6. Whenever an individual disk fails in the array, the failing disk may be replaced by a spare. However, a service call may be required when the number of available spares becomes small.
Additionally, autonomic parity exchange is a concept for increasing the failure tolerance in a storage system by converting a parity disk to a data disk. However, in cloud-class systems, it is also important to protect against failures that result in the loss of an entire set of storage elements (set loss), such as an entire JBOD. Although some prior art systems can remedy the failure of individual disks or whole JBODs and even a combination of those, these systems have reduced recovery capability with respect to a combination of set loss and element loss. In particular, each set of an array may form a failure boundary, whereby a set of elements can be made unavailable or lost based on a single event. For example, the network attachment or power to a set may fail, or the set may be inadvertently removed, misconfigured, etc. When such an event occurs, a significant number of elements will be taken off-line or lost, but in a specific physical configuration. However, when parity exchange is used, then, over time, the logical configuration of the array will deviate from the initial physical configuration of the array. Accordingly, the physical failure boundaries of the array will be different from the logical failure boundaries of the array.