A distributed system is a system in which components execute concurrently to achieve a common goal. The components of distributed systems typically communicate through message passing. Maximizing the performance of such communication can be important for efficiently achieving the common goal of the distributed system.
The most common network configurations in distributed systems are master/slave and peer-to-peer configurations. In a master/slave configuration, one node acts a master node by establishing timing and controlling communications with slave nodes, typically through a request-and-reply model (or a multicast request followed by individual replies). However, in such a configuration, the slave nodes are unable to initiate communications with the master node or with each other, and the master node's failure affects overall system communication, i.e., the master node is a single point of failure.
By contrast, nodes in a peer-to-peer configuration are each able to initiate communications with other nodes when there is a need for data exchange, and such communications may be through the request-and-reply model or with all peers through a multicast model. Although the peer-to-peer configuration does not have a single point of failure, use of the request-and-reply model may result in heavy network traffic that decreases communication performance. For example, to synchronize data among N nodes, each of the N nodes needs to transmit a request for information to the other nodes, and each of the N nodes also needs to reply to requests from N−1 other nodes, giving a total of N requests+N*(N−1) replies, or N2, messages. Multicast may be used to address this performance issue with the request-and-reply model by sending messages to a group of recipients at a time. However, traditional multicast does not guarantee the delivery of messages or their delivery order, so data may not be successfully synchronized among the nodes of the distributed system.