Clustered storage systems allow multiple storage computers, or “nodes,” to work together in a coherent storage system. Clustered storage systems utilize various configurations of multiple processors, controllers, memory, and other resources to increase the performance of the storage system as well as provide redundancy and high availability.
One such configuration is a high availability cluster with two nodes: a primary node and a secondary node, each having its own physical storage devices (disks). In an Active-Passive mode configuration, write I/Os may be served by the primary node while reads may be served by both of the nodes. Every write I/O operation to the primary node may be mirrored to the secondary node before the operation is acknowledged as complete to the initiator of the I/O. In the event of a failure of the primary node, the secondary node having the mirrored data from the failed node can continue to service all I/Os. Technologies, such multipath I/O (“MPIO”), may make such node-failovers transparent to the initiators. However, in such a mirrored configuration, only half of the actual physical storage space is available to the initiators.
Another configuration involves the sharing of the physical storage devices, such as a redundant array of inexpensive disks (“RAID”), by the clustered nodes. In this scenario, the RAID array is exclusively owned by the primary node which services all I/O operations, while the secondary node acts as a “hot spare” and takes control of disks in the event of a failure of the primary node. The failover to the secondary node may be made transparent to the initiators of I/Os, since the primary node has informed the secondary node of all write I/O operations that have been transacted on the primary. Since the RAID array in this configuration is shared, it may be configured in a less redundant RAID level, such as a RAID 5, in order to gain more storage capacity over the mirrored configuration.
When the primary node fails, the secondary node may replay any outstanding writes that were not completed by the primary node to ensure no data is lost. However, in the case where the RAID array is in a degraded mode, such as the failure of a disk drive in the array, the data stored in the RAID array may not be in a consistent state, and consistency may not be recoverable due to the degraded state.
It is with respect to these considerations and others that the disclosure made herein is presented.