Distributed systems may include multiple nodes storing copies of the same data locally at the nodes. In some distributed systems, there is a strong consistency requirement throughout the system, making it very important to ensure that all nodes operate on the same value of the data at all times. However, data updates and system failures may result in nodes having different values for the same data element, in contradiction of a requirement for consistency. Examples of distributed systems having consistency requirements may include data serving systems where multiple caches access the same storage. Each cache can be considered a node in the distributed system. With a consistency requirement, all nodes should see an update to a data element, or no node should see it. Further, a data element that is updated should not be made persistent (and thus available for reading) unless all nodes have been made aware of the update.
One approach to such consistency requirements in a distributed system is the known Two-Phase Commit Protocol. The Two-Phase Commit Protocol (2PC) is a blocking protocol that operates from two phases: a commit-request phase, and a commit phase. Briefly, the commit-request phase allows a controller or coordinator to prepare all participating entities for commit. The coordinator sends a query-to-commit message and waits for a reply from all entities. The entities execute the transaction, and write to a log that allows undoing the transaction. The entities then each reply with an agreement or abort message. If all entities agree, the coordinator sends a message to cause all entities to complete the transaction. If there is not complete agreement, the coordinator sends a message to cause all entities to undo the transaction. As will be understood by those skilled in the art, the implementation of 2PC suffers from many disadvantages, including that the implementation is complex, locks data for the entire period of time that coordination messages are exchanged, consumes bandwidth for the coordination messages, is subject to error if a system failure occurs during the commit phase, and requires robust network connections to work well.