1. Field of the Invention
The present invention relates to virtual interfaces on storage system clusters, and more particularly to failover of virtual interfaces in a clustered environment.
2. Background Information
In a clustered data storage system, data access via the well known Network File System (NFS) and Common Interface File System (CIFS) protocols relies on the availability of client-facing virtual interfaces (VIFs) hosted on the data ports of nodes in the cluster. A VIF consists of an IP address, virtual server ID, virtual interface ID, and various associated pieces of information (netmask, routes, etc). Continuity of a file system service requires that VIFs be available and that they failover in response to network port failure, network interface failure, cable failure, switch port failure, or storage system failure. VIF failover typically requires relocating a VIF to a different hosting port, either in local node or a remote node.
A VIF manager attempts to maintain one-and-only-one active instance of each
VIF in the cluster at all times. To prevent discontinuities in CIFS service and NFS service through a Transmission Control Protocol (TCP) connection, the VIFs should not failover too quickly in response transient errors. Furthermore, a VIF should never be active on two different ports at any given time. Therefore, the system must ensure that all potential hosting node for any particular VIF agree on the current hosting rules for the VIF, at least to the extent that two hosting nodes never claim simultaneous responsibility for hosting a particular VIF.
A previously known method for providing virtual interface failover uses a quorum mechanism such as a replication database (RDB) service. This method designates one VIF manager as a coordinator on the RDB quorum master node. The coordinator monitors VIF manager health based on the RDB quorum. Each VIF manager instance monitors its local network port status based on a network link status reported by a network module. When node failure or network link failure is detected, the new VIF hosting port/node is published in the VIF configuration database. A message is then written to the hosting nodes' event record through one RW RDB transaction.
In response to the event record update notification delivered by RDB service, the new hosting nodes wake up and initiate read-only (RO) transactions to read their messages. Each new hosting node then opens a read-write (RW) transaction to read the VIF configuration database for newly assigned VIFs, and brings up the newly assigned VIFs. This method uses a number of RW transactions that is proportional to the number of nodes involved in the VIF failover when activating the newly failed over VIFs. It is desirable to minimize the number of RW transactions because RDB service can perform only one RW transaction at a time across the cluster. Therefore, the use of RW transactions serializes the failover operations and limits concurrency in the cluster. RO transactions, by comparison, can be performed by accessing a local copy of the replicated database.
The ability to perform failover transactions in a quorum-based system has been limited because upon large scale failover many servers in a cluster simultaneously require the same scarce resource, i.e., access to the RDB RW transaction. VIF managers have been required to retry the failover process upon unavailability of RW transactions.