In current storage networks, and particularly storage networks including geographically remote directors (or nodes) and storage resources, preserving or reducing bandwidth between resources and directors while providing optimized data availability and access is highly desirable. Data access may be localized, in part, to improve access speed to pages requested by host devices. Caching pages at directors provides localization, however, it is desirable that the cached data be kept coherent with respect to modifications at other directors that may be caching the same data. An example of a system for providing distributed cache coherence is described in U.S. Patent App. Pub. No. 2006/0031450 to Unrau et al., entitled “Systems and Methods for Providing Distributed Cache Coherency,” which is incorporated herein by reference. Other systems and techniques for managing and sharing storage array functions among multiple storage groups in a storage network are described, for example, in U.S. Pat. No. 7,266,706 to Brown et al. entitled “Methods and Systems for Implementing Shared Disk Array Management Functions,” which is incorporated herein by reference.
Data transfer among storage devices, including transfers for data replication or mirroring functions, may involve various data synchronization processing and techniques to provide reliable protection copies of data among a source site and a destination site. In synchronous transfers, data may be transmitted to a remote site and an acknowledgement of a successful write is transmitted synchronously with the completion thereof.
In an active/active storage system, if there are multiple interfaces to a storage device, each of the interfaces may provide equal access to the storage device. With active/active storage access, hosts in different locations may have simultaneous read/write access via respective interfaces to the same storage device. Various failures in an active/active system may adversely impact synchronization and hinder the ability of the system to recover. Especially problematic are failure scenarios in active/active storage systems involving asynchronous data transmissions.
Specifically, in active-active data storage environments, it is necessary to designate a witness to resolve split-brain situations. A split brain situation can occur when communication between the various storage nodes is lost. In this type of situation, the witness acts as a mediator by choosing one of the storage nodes as a winner and making the other a loser. The winning storage node continues to be available, while the losing storage node suspends its availability for I/O requests.
At the moment of failure, it is important to choose the best storage node as the winner because storage nodes may have different configurations and state characteristics at the moment of failure. In today's technology, witness selection relies on periodic state exchange messages as the sole characteristic in choosing which node should take over in the event of communication loss between active-active nodes.
Witness technology available today or implemented by storage array vendors does not take into account the overall availability criteria of one node versus another node when determining who should be the winner. Current implementations of witness technology only focuses on the health of the local active-active arrays and their ability to communicate with the witness itself and the remote node in the event of system or network failure.
Witness technology fails to account for the “overall characteristics” of one node when compared with another node. For example, one node may have a valid data replication leg, more CPU horsepower, more memory banks, and the like. There is thus a need for witness technology to make more robust decisions when choosing a winning node to be used in failover mode.