1. Technical Field
The present invention relates to controlling failover in storage apparatus, and, more particularly, to controlling failover in clustered storage apparatus networks.
2. Description of Related Art
The concept of clustering of computer systems is well-known in the art. Nevertheless, a brief summary of the background may be helpful in understanding the present invention in its preferred embodiments.
A cluster consists of a group of computer systems (henceforth known as ‘nodes’) that operate together to provide a service to one or more clients or applications. One of the benefits of clustered systems is the ability to continue operation in the face of failure to one or more nodes within the cluster: in the event of some nodes within the cluster failing the work being performed by these nodes is redistributed to the surviving members of the cluster. Even with node failures the cluster continues to offer a service to its clients, although typically with reduced performance.
With most clustered systems it is necessary to prevent a cluster which is split into two groups of nodes from allowing both groups of nodes to continue operating as independent clusters. This problem is normally solved by introducing the concept of a quorum—a minimal set of nodes required for the cluster to continue operation. When a cluster of nodes is partitioned into two groups one group will maintain a quorum and will continue operating while the other group will be inquorate and will cease to participate in the cluster. To achieve this each node in the cluster needs to check that it is still part of the quorum as it processes service requests so that as soon as it determines it is in an inquorate group it stops participating in the cluster. This is typically achieved either by using heartbeats or a lease. The concepts of heartbeats and leases as means for controlling connected systems are well-known in the art, but, for better understanding of the present disclosure, a brief introduction to the relevant concepts related to leases is offered here.
A lease permits a node to offer a service on behalf of the cluster without having to refer to its cluster peers to service each request. The lease defines a time-limited period during which the node can offer the service without further reference to the peers. An infrequent message can be used to extend the lease, so that the node can continue to offer the service for a long period. In the event of a loss of communications with a node that has been granted a lease, the peer nodes of the prior art typically wait for a period of time not less than the lease before being assured that the node has stopped participating in the cluster and allowing the transfer of work from the failing node to surviving nodes within the cluster.
The concept of lease is particularly valuable in clustered systems which must present a coherent image of some changing information, and in which requests to view that information must be serviced with minimal cost, certainly less than that required to correspond with other nodes.
The lease time defines the minimum period during which a service is unavailable following a failure (henceforth ‘failover time’). Even short periods of unavailability will appear as glitches in system operation which will decrease customer satisfaction. Minimising this time improves the quality of the system. The shorter the lease time used by the cluster the faster the failover time. However, the shorter the lease time the more frequently nodes within the cluster need to extend the lease and consequently the greater the overheads are for maintaining the lease. The minimum lease time is also bounded by the speed of communications between nodes—the lease time cannot be less than the time it takes to communicate a lease extension. Therefore, while it is desirable to have a very short lease time to minimise the failover time, in practice this is often not possible.
The governing of systems using leases ensures correct operation in the face of almost any failure (it is dependent on the correct operation of a clock). However, it is a rather conservative measure, and there is a particular class of system failure which is common and where it would be desirable to avoid the overhead of a lease operation, namely that of software failure caused by an ‘assert’—a form of failure where the software itself has detected some illegal or unexpected situation and has determined it is safer to exit and restart than to continue operation.
The normal method for improving failover time in a lease-based system is to make the lease time as short as possible. The disadvantage of this method is that the more frequently a lease needs to be renewed, the higher the overheads are for maintaining the lease. The minimum lease time cannot be less than the time it takes to communicate a lease extension. Many clustered systems require dedicated hardware to allow nodes in the cluster to communicate lease extensions as quickly as possible.