Storage systems may include a plurality of solid state or other disk drives (e.g., SSD's) and may enable one or more clients to access and store data, e.g., via network file server (NFS) or other distributed file system calls.
A storage system may expose an Internet Protocol (IP) address to be used by clients to connect to the storage system. An entity such as a storage “controller” may be provided to manage remote access to data storage resources of the storage system.
Redundancy may be used to ensure high availability. For example, a storage system may include an “active” controller that is currently engaged in providing access to storage resources, e.g., in response to NFS or other storage operation requests received from remote clients. Another controller may be configured as a “standby” controller. A standby controller may monitor the active controller and other storage system state information. The standby controller may be configured to detect a failure of the active controller and to take over the role of active controller in the event a failure is detected.
Known techniques to monitor for and detect failure of an active controller include periodically sending a ping to the active controller and monitoring for a response. Pings may be sent and responses received via an internal network connection. However, if the internal network connection fails or becomes slow, the ping and/or response may not be received.
Some prior art storage systems fallback on a secondary technique to verify status of the active controller in the event a response is not received from the active controller after a prescribed number of pings. In one approach, the active controller may be configured to refresh SCSI keys if it stops receiving pings. The standby controller can check to see if the keys have been refreshed, e.g., since last checked and/or within a prescribed interval. If so, the active will be determined to still be alive, despite the failure to receive responses to pings. However, internal network or other communication failures may be common, and SCSI key refreshes are expensive (e.g., time consuming) operations.