In order to provide for high-throughput of work, or nearly continuous availability, distributed computing systems are often utilized. A distributed computing system typically includes two or more computing devices which frequently operate somewhat autonomously and communicate with each other over a network or other communication path.
A computing device of a distributed computing system that has the capability of sharing resources is often referred to as a cluster which has two or more nodes, each node having a processor or at least a processor resource, and typically, a separate operating system. One example of a distributed computing system utilizing one or more clusters is a storage area network (SAN) which includes a storage controller.
A storage area network is frequently used to couple computer storage devices such as disk arrays, tape libraries, optical jukeboxes or other storage devices, to hosts in a manner which permits the storage devices to appear to the operating systems of the hosts as locally attached to the hosts. In operation, a host may request data from a storage controller which may in turn retrieve the data from one or more storage devices. The host may also transmit data to the storage controller to be written to one or more storage devices.
Each host communicates with the storage controller through a channel path of the storage area network. Each channel path typically includes one or more physical hardware communication channels such as a digital electronic communication bus, a digital optical communication bus, or a similar communication channel. In addition, each channel path may include one or more logical control blocks, addresses, communication devices, digital switches, and the like for coordinating the transmission of digital messages between the host and the storage controller. Fibre Channel (FC) is often used in storage area networks and is a high speed networking technology in which signals may be transmitted over various transmission media including fiber optic cable or twisted pair copper cables, for example.
A storage controller may have multiple servers which are assigned input/output (I/O) tasks by the hosts. The servers are typically interconnected as nodes of one or more clusters in a distributed computing system, in which each node includes a server often referred to as a central electronics complex (CEC) server.
The I/O tasks may be directed to specific volumes in the storage. The storage controller may further have multiple input/output (I/O) adapters such as host adapters which enable the servers to communicate with the hosts, and device adapters which enable the servers of the storage controller to communicate with the storage devices. Switches may be used to couple selected servers to selected I/O adapters of the storage controller.
Various resources of the storage controller including the I/O adapters may be shared among the servers or other nodes of the distributed computing system. A shared resource is typically used by one node of the distributed computing system at a time. Access to a shared resource by the nodes of the distributed computing system is frequently controlled by a “lock” function. A separate embedded hardware lock device often provides the lock function and is coupled by a communication path to each node which may utilize the shared resources associated with the particular lock device.
To gain access to a shared device, a node typically communicates a request over a communication path to the lock device controlling access to the shared resource. If the shared resource is available, the request is granted. Once granted, the requesting node “holds” the lock and the shared resources associated with that lock are assigned to the node holding the lock. In this manner, the node holding the lock “owns” the lock and the shared resources associated with the lock, and may utilize those shared resources to the exclusion of the other nodes which are blocked from accessing the shared resources by the lock device.
Upon completing the task or tasks utilizing the shared resources, the node holding the lock for the shared resources again contacts the lock device and releases the hold on the lock and thus releases the “ownership” of the lock and the associated shared resources. The released lock is then available to be held by another node to gain access to the shared resources of the lock.
In some distributed computing systems, a failover may comprise an event in which a cluster may automatically switch over to one or more backup nodes in the event of a failure of a node. Thus for example, a central electronics complex (CEC) server failover may occur, in which if one CEC server fails, the system performs a failover to another CEC server.
In a failover, a surviving node such as a CEC server node, takes ownership over some or all of the shared resources of the distributed computing system. Access to those shared resources to be taken over as a part of the failover, is frequently controlled by a failover lock provided by a separate hardware lock device, often referred to as a failover lock device. Accordingly, the node performing the failover transmits a request to the failover lock device for ownership of the shared resources associated with the failover lock. If granted, the requesting node holds the lock and is granted access to the shared resources associated with the failover lock. Access by any other node of the distributed computing system, to the shared resources associated with the failover lock, is blocked by the failover lock held by the requesting node. Accordingly, the node holding the failover lock may then complete the failover operation. In this manner, the failover lock provides to the node holding the failover lock exclusive access to the shared resources associated with the failover lock until the holding node releases the failover lock.
Another type of lock used in data processing is a data unit lock. For example, before a first host accesses data through a storage controller or other storage system, the first host may request that the data be locked. For example, the first host may request that specified data such as one or more rows in a data table, one or more tracks of a hard disk drive, or the like be locked. If the first host is granted the lock, the first host may access the specified data without a second host being allowed access to the specified data. The first host has exclusive access to the specified data until the first host releases the lock. For transactions such as airline reservations, credit card transactions, or similar transactions, a lock function is designed to allow a transaction with a first host to be completed before a second host can access the transaction data.