Distributed systems, also known as distributed computing, relate to a collection of autonomous computational entities or nodes in a network comprising software to produce an integrated computing facility by communicating and coordinating actions by exchanging messages. Distributed systems allow for more efficient task execution due to their size and power, which is superior to a combination of stand-alone systems. Distributed systems have a variety of application fields such as telecommunication networks, real-time process control and parallel computation, among others. Distributed systems may have different architectures such as client-server, three-tier, n-tier, or peer-to-peer, loose coupling or tight coupling.
Consensus in the context of distributed systems is the task of getting all processes in a network to agree on some specific value by voting. A value is usually proposed by at least one of the nodes of the network, and the rest of the nodes must vote to agree on whether to do something or not. Consensus is useful for coordinator election, also known as a leader election. A leader in the context of a distributed system is a single process or node usually designated as the organizer among the nodes of the distributed system. A leader node may for example handle client requests and solve conflicts among the nodes in the distributed system.
One of the challenges of distributed systems is the handling of failures, because of the complexity and possibly large number of components in interaction. As a result, reliability is one of the most sought characteristics of distributed systems. Failures in distributed systems fall in hardware and software categories. Several types of failures exist: halting failures, fail-stop, omission failures, network failures, network partition failure, timing failures and byzantine failures. Other desired features of distributed systems include resource sharing, openness, concurrency, scalability, fault tolerance and transparency.
Different methods and algorithms have been devised for dealing with such failures. Known algorithms, such as Paxos and Raft, are used for solving consensus problems in a network of unreliable processors. Such methods teach ways to elect a leader from a plurality of nodes. The leader node is then responsible for handling client requests and solving conflicts between nodes. Such systems usually operate synchronously and require a leader or virtual leader.