1. Field
This application relates to low-level data storage and data transfer systems such as form the back-ends of databases and filesystems or provide a storage platform directly for computer applications.
2. Prior Art
In order to provide a foundation for useful computer applications and business logic (hereafter generically termed “application logic”), computer systems must fulfill the requirement of storing and retrieving data. Researchers in this field have identified and named several universal requirements of data storage systems. “Durability” is the fundamental requirement that the storage system not lose data. “Availability” is the requirement that the storage system be at all times capable of retrieving data within a reasonable timeframe (“read-available”), and of accepting new data for storage (“write-available”). It is widely recognized that providing durability and availability requires storing data on multiple computer systems, since any single computer system, however well designed, is susceptible to failure of a hardware component (such as a hard disk drive).
One possible arrangement, known in the prior art as a “gossip”-based method, is to have pairs of computer systems also known as “nodes” of the network) continuously transfer new data. In such an arrangement, data newly introduced at any node “diffuses” through the system until it is stored on all nodes.
In such schemes, a node may operate either as a receiver or a sender or operate in both modes simultaneously in the same interaction. The receiver is the node that is attempting to obtain new data and the sender is the node that is supplying new data.
Nodes may engage in logically independent transfers with multiple different peer nodes at the same time. Such transfers are said to be “contemporaneous”.
Such schemes continuously attempt to determine, for particular pairs of nodes, or for all pairs of nodes, which of the items of data a first node has in storage are not present on a second node and vice versa. After each node sends data that the other is missing, the nodes are in synchrony (assuming that neither node was engaged in a contemporaneous transfer with a third node).
Many such schemes permit contemporaneous transfers amid a steady influx of new data, with the result that synchrony of all nodes, or even a particular pair of communicating nodes, is rarely reached. Such schemes are said to be “eventually-consistent” because synchrony across all nodes (consistency) would eventually be reached if new data were to stop being introduced and pairwise updates were to continue for a sufficiently long time.
A significant difficulty in layering application logic on top of eventually-consistent data propagation schemes is that data items may arrive in different orders at different nodes in the network. Unless the application state transitions are “commutative”, results will be order-dependent. Usually, application logic cannot be written in a commutative form, and it is not acceptable for nodes to disagree on the application state in the long run, resulting in an “ordering problem”.
Conventionally, the ordering problem is solved by having the application logic support a “merge” operation that can reconcile application states (differing application states are sometimes known as “conflicts”). Implementing a merge operation imposes a substantial cost burden during the development phase and the resulting program code may harbor hard-to-find bugs, as it is difficult to develop test scenarios that exercise concurrent writes sufficiently thoroughly to explore all possible inconsistencies that may arise during real-world operation.
In particular, a prior-art system for distributed data management known as “Bayou”, takes the approach of requiring application logic to contain code that can perform merge operations for resolving conflicted application states. Bayou networks elect a distinguished “master node” that generates and distributes to other nodes an “official history” that is a specific ordering of all writes that become known to the master node.
Bayou uses “undo logging” to facilitate write reordering. Undo logging is a mechanism for rolling back database record updates or file write sets that cannot be allowed to complete due to the detection of conflicts with the updates and writes of concurrently-executing processes. Bayou uses undo logging to facilitate write reordering.
In Bayou, each node may apply any writes “tentatively” as soon as the writes are discovered by the node, but nodes that have applied tentative writes may be required later to undo them and reapply all writes in an order consistent with the official history. The writes that may be undone represent modifications of a tuple store provided to the application by the Bayou system.
Bosneag and Brockmeyer's paper “A Formal Model for Eventual Consistency Semantics” discusses techniques for efficiently rearranging other nodes' write histories in the manner of Bayou, by avoiding reordering in cases where the order can be mechanically determined not to matter. These features are dependent on correct dependency information being provided to the system.