Clusters are groups of computers that use groups of redundant computing resources in order to provide continued service when individual system components fail. More specifically, clusters eliminate single points of failure by providing multiple servers, multiple network connections, redundant data storage, etc. Absent clustering, if a server running a particular application fails, the application would be unavailable until the server is restored. In a clustering system, the failure of a server (or of a specific computing resource used thereby such as a network adapter, storage device, etc.) is detected, and the application that was being run on the failed server is automatically restarted on another computing system (i.e., another node of the cluster). This process is called “failover.” Note that virtual machines (VMs) can be failed over between computing systems, as well as individual applications.
Clustering systems are often combined with storage management products that provide additional useful features, such as journaling file systems, logical volume management, multi-path input/output (I/O) functionality, etc. Where a cluster is implemented in conjunction with a storage management environment, the computer systems (nodes) of the cluster can access shared storage. The shared storage is typically implemented with multiple underlying physical storage devices, which are managed by the clustering and storage system so as to appear as a single storage device to computer systems accessing the shared storage.
An individual node of a cluster can use a non-shared, local cache. For example, the local cache can be in the form of a solid state drive (SSD) using fast integrated circuit based memory. The node can use its local cache for caching of shared storage content, which can significantly decrease latency. However, each such cache is local to the individual node and not shared between nodes in the cluster, whereas the shared storage is global to the cluster and shared between multiple nodes. Therefore, a node can erroneously read stale data from its local cache after a cluster based event affecting shared storage such as a failover, if cached blocks of shared storage are modified by another node of the cluster. Even absent that scenario, in the case of a failover or any event which causes a file on shared storage to be accessed from another node of the cluster, the local cache of the accessing node is cold, and is only slowly warmed as the access proceeds.
It would be desirable to address these issues.