A virtualized cluster is a cluster of different storage nodes that together expose a single storage device. Input/output operations (“I/Os”) sent to the cluster are internally re-routed to read and write data to the appropriate locations. In this regard, a virtualized cluster of storage nodes can be considered analogous to collection of disks in a Redundant Array of Inexpensive Disks (“RAID”) configuration since a virtualized cluster hides the internal details of the cluster's operation from initiators and instead presents a unified device.
The order in which data is laid out among the different nodes within a cluster determines the cluster's configuration. Normally, data is laid out with two considerations in mind: performance and redundancy. Analogous to a RAID configuration, data in a cluster can be either striped across all the nodes or mirrored so that each byte of data is stored in at least two nodes. The former method is useful for high performance and maximal disk capacity utilization. The second method is useful for protecting data when a node fails. Due to the fact that mirrored configurations use twice the amount of physical space for storing the same amount of data, the cost of such configurations per storage unit is twice that of striped systems. Mirrored configurations that support both an odd and even number of nodes in the cluster are called chained declustered configurations.
In a non-networked RAID configuration, a trade-off between cost, performance, and redundancy is achieved through the use of two RAID levels called RAID-5 and RAID-6. These two RAID levels do not provide redundancy through mirroring; rather, they create error-correcting parity blocks out of the data and store the parity information. By utilizing parity blocks, it is possible to restore the data in the case of the failure of one RAID device.
It is extremely difficult, however, to extend RAID to the case of a networked clustered environment. This is because the algorithms needed for synchronizing operations between the nodes become complex and, as a result, there is often a need for sophisticated distributed locking mechanisms. For instance, when a write occurs to a network RAID-5 configuration, the data zone and the parity zone will both need to be updated. This is a compound I/O operation, requiring that the new parity be computed, and the two blocks be written. In order to compute the new parity, it is necessary to first read the old data and old parity, and to perform an exclusive-or (“XOR”) operation with the old data, the old parity, and the new data. The entire update operation must also be atomic to maintain the stripe integrity of the parity in the event of a power failure. As a result, multiple networked read and write operations must be performed and a complex distributed locking mechanism must be utilized. This slows the operation of the cluster considerably.
As another example, when a network RAID-5 configuration is degraded due to the failure of a single node, the data on the failed node is reconstructed by reading the desired data from all of the other nodes and performing an XOR operation to regenerate the lost data. This also requires network read operations to be performed across all of the nodes in a cluster and a distributed lock to be applied to the cluster. These operations significantly impact the performance of a cluster that utilizes a network RAID-5 configuration. These considerations have led previous network RAID-5 and RAID-6 implementations in clustered storage to be infeasible for commercial implementations.
It is with respect to these considerations and others that the disclosure made herein is presented.