In a complex computing environment, an administrator can organize computing, memory, and storage resources into levels of aggregation. For example, a node represents the aggregate computing, memory, and storage resources of a physical server. A cluster of nodes represents the aggregate computing, memory, and storage resources of a group of physical servers. An administrator can manage the aggregate resources of the cluster as a single entity. For example, a cluster of nodes organized as a distributed storage system can store a storage object as components of the object, and replicas of components, on multiple storage devices within the cluster. Maintaining the distributed storage system across multiple local area network sites substantially increases fault tolerance and provides better support for site disaster recovery.
Deploying multiple object owners or coordinators in the distributed storage system across the multiple network sites improves performance in managing the object. For example, each object has a primary coordinator that takes ownership of the object and is in charge of processing all input/output (I/O) requests directed to the object. The primary coordinator routes or distributes the I/O to the appropriate object replicas. When the replicas are distributed across multiple network sites, each site may have a secondary coordinator to manage the replicas of the object inside that network site. The primary coordinator forwards the I/O to the secondary coordinators, which each will then route or distribute the I/O to replicas of the object inside the network site managed by that secondary coordinator.
Such a configuration, however, creates difficulty in authenticating different coordinators that each might have an exclusive authorization for connection to a component replica at different times. For example, because a new primary coordinator can be elected dynamically once the previous primary coordinator is shut down, a race condition is probable when both the new primary coordinator and old secondary coordinator attempt to connect to a replica. This may result in both coordinators attempting to reestablish connection to the replica(s) after the other coordinator interrupts the previously-established connection.