A cluster is a plurality of nodes physically connected to an inter-node communication network. Each of the cluster nodes is a computer system. The computer system may include a Central Processing Unit (xe2x80x9cCPUxe2x80x9d), memory, an inter-node communications interface and IO subsystem.
A storage device may be connected to the IO subsystem in a node. The storage device may be shared by a plurality of nodes by connecting the device to the IO subsystem in each node. By sharing the storage device amongst a plurality of nodes, multiple paths are provided for accessing the storage device. The multiple paths to the storage device provide redundancy in the case of a failure in one of the nodes by sending an IO request to the storage device through a non-failed node.
A well-known standard interface for connecting storage devices to an IO subsystem is the American National Standards Institute (xe2x80x9cANSIxe2x80x9d) Small Computer System Interface (xe2x80x9cSCSIxe2x80x9d). ANSI SCSI defines a protocol for accessing storage devices connected to a storage network. The SCSI protocol permits a storage device connected to a storage network to be shared by a plurality of nodes. The IO subsystem includes in each node a storage network controller. The storage network controller includes logic for issuing IO commands over the storage network to the storage device. The IO commands include a command to read data from the storage device and a command to write data to the storage device.
ANSI SCSI includes a Persistent Reserve command. The Persistent Reserve command allows a storage device to be shared by more than one cluster node. Each storage network controller issues a Persistent Reserve command to the storage device to register with the storage device. A second Persistent Reserve command is issued to reserve the device by specifying the access type. The storage device stores a list of registered storage network controllers with a corresponding registration key and the type of access permitted.
The Persistent Reserve command provides security by requiring registered storage network controllers to provide their registration key before allowing the storage network controller to perform commands restricted to members of the group of registered storage network controllers. For example, if each storage network controller registers with registration type xe2x80x9cwrite exclusive registrants onlyxe2x80x9d, only registered storage network controllers have permission to write to the storage device but all other storage network controllers have permission to read from the storage device.
In a cluster, a node failure is communicated to survivor nodes on the inter-node communication network. Upon detecting the node failure, access to the storage device may be provided on an alternative path through survivor node in the cluster connected to the storage device. However, before access can be provided on the alternative path, all the pending IO commands issued by the failed node must be completed or aborted in the storage device in order to guarantee that these IO commands do not interfere with future IO commands from surviving cluster members. A survivor node in the cluster issues a Persistent Reserve command to the shared storage device to request the completion or abortion of all IO commands issued by the failed node in the cluster.
There are two types of SCSI physical connections. A parallel SCSI physical connection provides for the connection of a maximum of sixteen devices including storage devices and storage network controllers. A serial SCSI physical connection provides for the connection of 264 devices including storage devices and storage network controllers, switches and routers. Through the SCSI physical connection, a cluster storage device may be accessed by several cluster nodes; that is, nodes connected to a cluster and non-cluster nodes. Through the use of the Persistent Reservation command write access to a cluster storage device can be limited to registered cluster nodes by registering each cluster node with xe2x80x9cwrite exclusive registrants onlyxe2x80x9d registration type.
The xe2x80x9cwrite exclusive registrants onlyxe2x80x9d state remains in effect as long as one of the cluster nodes is registered with the storage device. However, if the persistent reservation from the last cluster node is removed, a non-cluster node or a cluster node from another cluster may write to the storage device and corrupt data stored in the storage device.
The present invention provides a method for sharing a storage device amongst a plurality of computers while providing data integrity in the storage device. A computer is registered with the storage device by storing a computer identifier associated with a reserved access type in the storage device. Access to the storage device is provided to the registered computer dependent on the registered computer""s stored identifier. The type of access provided to the registered computer is dependent on the stored access type. Upon loss of knowledge of the stored identifier in the shared storage device by the registered computer, the identifier for the computer stored in the shared storage device is replaced with a new identifier for the registered computer. The registered computer may be a currently registered computer or a previously registered computer.
Upon detecting a failure in one of the registered computers, one of the survivor registered computers removes the registration for the detected failed computer by requesting deletion of the identifier associated with the reserved access type for the detected failed registered computer in the shared storage device. Outstanding commands in progress from the detected failed registered computer to the shared storage device are aborted. All commands to the shared storage are stalled until all pending commands issued by the detected failed computer are aborted.
The identifier for each computer may be unique or the identifier may be initialized to the same value. If the identifier is initialized to the same value, the identifier assigned to another computer stored after detection of the failed node differs from the previously stored identifiers.