Technical Field
The present disclosure relates to storage environments and, more specifically, to switchover between nodes of clusters of a peered cluster storage environment.
Background Information
A storage system typically includes one or more storage devices, such as disks, into which data may be entered, and from which data may be obtained, as desired. The storage system may logically organize the data stored on the storage devices as storage containers, such as files, directories, logical units (luns), etc. The data may be accessed via nodes of the storage system which provide storage services to clients. Certain nodes may be interconnected as a cluster, and configured to provide redundancy within the cluster, such that when one node of the cluster fails another node of the cluster may perform a takeover and service operations (e.g., service data access requests) directed to the failed node's storage containers. Likewise, clusters themselves may be peered to provide further redundancy, such that when one cluster fails another cluster may perform a switchover and its nodes may service operations (e.g., service data access requests) directed to the failed cluster's storage containers.
However, sometimes a switchover may be interrupted, for example, due to a reboot or panic during a switchover. Upon resumption of normal operation (e.g., on reboot or clearing of the panic), a node may desire to complete the switchover sequence or perform an early switchback (i.e. an operation where the interrupted switchover is aborted and any storage devices that may have been switched over are switched back their original owners), yet may have difficulty determining how far the prior switchover progressed before the interruption, and from which of a plurality of potential sources to replay logged operations (e.g., data access requests) to ensure consistency. This issue may be exacerbated by the potential for storage devices to become temporarily inaccessible during a switchover, or in the interim between an interrupted switchover and a retried switchover or early switchback. With existing techniques, a node may have little indication of which potential source to utilize to replay logged operations (e.g., data access requests) in response to a particular interruption (e.g., error) scenario.