1. Field of the Invention
The present invention relates to virtual interfaces on storage system clusters, and more particularly to failover of virtual interfaces in a clustered environment.
2. Background Information
In a clustered data storage system, data access via the well known Network File System (NFS) and Common Interface File System (CIFS) protocols relies on the availability of client-facing virtual interfaces (VIFs) hosted on the data ports of storage systems or nodes in the cluster. A VIF consists of an IP address, virtual server ID, virtual interface ID, and various associated pieces of information (netmask, routes, etc). Continuity of a file system service requires that VIFs be available and that they failover in response to network port failure, network interface failure, cable failure, switch port failure, or storage system failure. VIF failover typically requires relocating a VIF to a different hosting port, either in local node or a remote node.
A VIF manager attempts to maintain one-and-only-one active instance of each VIF in the cluster at all times. To prevent discontinuities in CIFS service and NFS service through a Transmission Control Protocol (TCP) connection, the VIFs should not failover too quickly in response to transient errors. Furthermore, a VIF should never be active on two different ports at any given time. Therefore, the system must ensure that all potential hosting nodes for any particular VIF agree on the current hosting rules for the VIF, at least to the extent that two hosting nodes never claim simultaneous responsibility for hosting a particular VIF.
A previously known method for providing virtual interface failover uses a quorum mechanism such as a replication database (RDB) service. This method designates one VIF manager as a coordinator on the RDB quorum master node. The coordinator monitors VIF manager health based on the RDB quorum. Each VIF manager instance monitors its local network port status based on a network link status reported by a network module. When node failure or network link failure is detected, the new VIF hosting port/node is published in the VIF configuration database. A message is then written to the hosting nodes' event record through one RDB read-write (RW) transaction.
In response to the event record update notification delivered by RDB service, the new hosting nodes wake up and initiate read-only (RO) transactions to read their messages. Each new hosting node then opens a RW transaction to read the VIF configuration database for newly assigned VIFs, and brings up the newly assigned VIFs. This method uses a number of RW transactions that is proportional to the number of nodes involved in the VIF failover when activating the newly failed over VIFs. It is desirable to minimize the number of RW transactions because RDB service can perform only one RW transaction at a time across the cluster. Therefore, the use of RW transactions serializes the failover operations and limits concurrency in the cluster. RO transactions, by comparison, can be performed by accessing a local copy of the replicated database.
The ability to perform failover transactions in a quorum-based system has been limited because upon large scale failover, many storage systems/nodes in a cluster simultaneously require the same scarce resource, i.e., access to the RDB RW transaction. VIF managers have been required to retry the failover process upon unavailability of RW transactions.
A previously known VIF failover algorithm employs two tables: a VIF configuration table and an event table. The VIF configuration table contains the current cluster-wide VIF hosting rules. The event table contains a list of affected VIF managers and is employed as a signaling mechanism, whereby each involved secondary VIF manager may be notified (via an RDB record update callback) that it has been assigned a task. The event table and configuration table are updated together, under the same single RW is transaction that assigns VIF hosting responsibilities. In this implementation, VIF managers that are not affected by a failover event ignore the notification and do not read the new configuration. This method saves a small amount of local processing on the unaffected nodes. However, these VIF managers also miss the opportunity to perform error correction by confirming that they are acting in conformance with the published configuration. The small savings in processing steps realized by this implementation does not represent a worthwhile tradeoff against the enhanced robustness of the overall system.