Technical Field
The present disclosure relates to storage systems and, more specifically, to high availability of data in a cluster of storage systems.
Background Information
A storage system typically includes one or more storage devices, such as disks, into which information (i.e. data) may be entered, and from which data may be obtained, as desired. The storage system (i.e., node) may logically organize the data stored on the devices as storage containers, such as files, logical units (luns), and/or aggregates having one or more volumes that hold files and/or luns. To improve the availability of the data contained in the storage containers, a plurality of nodes may be coupled together as a cluster with the property that when one node fails another node may service data access requests directed to the failed node's containers.
In such a cluster, two nodes may be interconnected as a high availability (HA) pair configured to operate as “shared nothing” until one of the nodes fails. That is, each node may service the data access requests directed to its storage containers and only services data access requests directed to the storage containers of another node (i.e., the partner node) after a failure of that node, which triggers a takeover sequence on the surviving node (i.e., the local node). Data availability is typically guaranteed by mirroring user (e.g., client) operations logged and serviced at the local node to the HA partner node. Such mirroring typically occurs over a high speed connection between non-volatile random access memory (NVRAM) hardware on both nodes. However, the HA pair configuration is typically determined at a pre-setup phase between the nodes and, once setup, the HA pair configuration typically may not be changed. Furthermore, after a failure of the local node, the HA partner node becomes a single point of failure (SPOF) for data availability until the failed node becomes operational because of an inability to redirect mirroring, even though there may be other available nodes in the cluster.
A possible solution to reduce dependency on the SPOF is to physically relocate some of the data to another node in a different HA pair of the cluster. Yet, this solution may be infeasible since the size of the storage containers may be too large. Moreover, relocation of data is both disk and network intensive, as such an operation may involve reading (retrieving) the data from one or more disks of a source node, transferring the retrieved data over a network to a destination node and writing (storing) the transferred data to one or more disks of the destination node. Another possible solution may use an aggregate relocation (ARL) approach that transfers an entire aggregate to the different HA pair. While this approach does not involve physical movement of the data, it is unfriendly from the point of storage management, since it requires intervention by a user to identify the aggregate. Furthermore, only the relocated aggregate may have HA support, which implicitly renders ARL non-practical.