The present invention, in some embodiments thereof, relates to a fault tolerant consensus protocol and, more specifically, but not exclusively, to a leader reselection process in a fault tolerant consensus protocol.
Replicated state machine (RSM) approach is an important tool for maintaining the integrity of distributed applications and services in failure-prone data centers and cloud computing environments. Paxos is a protocol used in RSM-based systems to create a consensus among a replication group of network nodes. The Paxos protocol includes some of the nodes acting as proposers, and one proposer being a leader that handles commands received from a client. The leader is selected out of the current members of the replication group of network nodes by a leader selection process. When the leader experiences failover, it is replaced by another network node as leader following a leader change process.
The performance of the Paxos protocol depends on the leader's availability. In particular, the leader failures may render the service managed by the RSM-based system temporarily unavailable for client communication thereby negatively affecting its latency and throughput.