1. Field of Invention
The present invention relates to storage area networks, and more particularly to using elements in storage area network to manage cluster membership of hosts attached to the storage area network.
2. Description of the Related Art
Demand for higher performance computer systems is never ending. Increased performance is demanded at both the host processing side and at the storage side. to improve performance and flexibility of the connection between hosts and storage units, storage area networks (SANs) have developed. SANs provide the capability to flexibly connect hosts to storage, allowing improved performance while reducing costs. The predominate SAN architecture is a fabric developed using Fibre Channel switching. Fibre Channel is a series of ANSI standards defining a high speed communication interface. One property of Fibre Channel is that links can be point to point. When the devices are interconnected by a series of switches, a fabric is formed. The fabric allows routing communications between the various connected devices.
In addition to high performance connections between the hosts and the storage units, a second technique used to increase system performance is clustering of the hosts. By interconnecting hosts, they can work together on the various tasks of a common program. This technique requires high speed communications between the hosts to manage the operations. These communications can occur using numerous networking protocols, such as Ethernet, Fibre Channel, InfiniBand or Myrinet.
However, several problems occur when clustering hosts, which limits the performance gains available. A first problem is cluster membership management. Every host (or node as often called) needs to understand the group of valid members of the cluster. There is significant overhead and network associated with this activity, particularly as the number of nodes grows. Simplistically, each node must periodically communicate with each other node, which generates traffic and requires processing by the node, both when sending and when receiving. Then, if a node senses a problem, all of the nodes need to reach consensus on the cluster membership. This consensus process is time consuming and also generates additional network traffic. So it would be desirable to improve the membership management of a cluster to eliminate much of the processing overhead, traffic and consensus-building.
A second problem is resource sharing. Usually the various nodes will share various resources. But also usually only one node at a time can access the resource. This is addressed by locking the resource when a node has control. When using locking to gain control of the resource, the node performs an operation on the lock to determine if another node has control. If not, the node gains control. If another node has control, the requesting node continues to perform the operation until successful. Thus traffic over the network is generated to handle the lock operation. Usually this is traffic between nodes because a node is used to implement the shared memory used to form the lock. So this further hinders performance by frequent accesses to the node and creates overhead sending and receiving the operations. The problem becomes significant in most systems because there are a large number of locks that must be implemented, with a large number of nodes vying for control. It would be desirable to limit traffic and overhead required to maintain resource locks.