A storage server is a computer system and a form of storage controller that is used to store and retrieve data on behalf of one or more clients on a network. A storage server operates on behalf of one or more clients to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes or solid state storage such as flash memory. A storage server may be configured to service file-level requests from clients, as in the case of file servers used in a Network Attached Storage (NAS) environment. Alternatively, a storage server may be configured to service block-level requests from clients, as done by storage servers used in a Storage Area Network (SAN) environment. Further, some storage servers are capable of servicing both file-level and block-level requests, such as certain storage servers made by NetApp®, Inc. of Sunnyvale, Calif.
To improve storage availability and performance, multiple individual storage servers can be integrated into a clustered storage system. A computer cluster typically consists of a set of loosely connected computers that work together so that, in many respects, the computers can be viewed as a single system. The components of a computer cluster are usually connected to each other through fast local area networks, each node (e.g., computer used as a server) running its own instance of an operating system. In this manner, each storage server in a clustered storage system can be used to access and store critical data, among other purposes. Additionally, a clustered storage system is able to provide load-balancing and/or failover capabilities.
To provide data backup in a storage system, including a clustered storage system, a data backup technique known as “aggregate mirroring” can be implemented. Aggregate mirroring involves backing up or replicating data stored in mass storage devices (also referred to as aggregates) at a primary site with an exact duplicate (i.e., mirror image) of the data in mass storage devices at a remote or secondary site. Thus, if data or connectivity is ever lost at the primary site, the data can be recovered or is accessible to a client from the secondary site.
In a storage server that handles large volumes of client requests, it may be impractical to save data modifications to the mass storage devices every time a write request is received from a client. This is primarily because disk accesses tend to take a relatively long time to complete in comparison to other operations. Accordingly, a storage server can store or cache write requests received from clients in a temporary data cache and periodically write the data in the data cache out to the mass storage devices or disks. The event of saving or writing the data in the data cache out to the mass storage devices or disks is called a “consistency point.” At a consistency point, the storage system saves any data that is modified by the write requests received from clients to its mass storage devices and triggers a process of updating the data stored at the mirror site to reflect the updated primary volume.
In prior art solutions, if a failure occurred in a storage system or in a storage cluster in between consistency points (i.e., prior to the data in the data cache being written out to disk), then although the clustered storage environment provides for failover to another node or cluster, the recovery process at the failover storage system will still be asynchronous because the failover node does not have access to the cache data associated with the node that failed. That is, the data that is stored in the data cache system of the failed node will be lost or temporarily unavailable to client systems.