Data storage systems, such as redundant array of independent disks (RAID) systems typically provide protection against disk failures. However, direct attached storage (DAS) RAID controllers have little to no defense against server failure because they are typically embedded within a server. Two or more nodes (i.e. servers) are often used for high availability storage clusters to mitigate consequences of a failure.
In multiple-node storage clusters, cache is frequently maintained on a local server. This local cache, often running from Gigabytes to Terabytes in size, helps in low latency and high performance completion of data transfers from regions of the storage cluster experiencing high activity or “hot” input/output (IO) data transfer requests. The local READ cache of a temporarily disabled node can become stale or invalid because other nodes continue to actively transfer data to both cached and non-cached regions of the storage cluster. Thus, when the node is rebooted, old cache data is typically purged and new local cache is built for the rebooted node, which can be very time consuming and degrading to node performance.