Replication refers to the concept of distributing replicas of data (e.g., files) to distributed storage devices, so as to make data more available compared to scenarios in which the data is stored on a single storage device. The replicas of the data are continuously updated in order to ensure that up-to-date data is readily available and accessible. That is, data replication systems maintain a consistency invariant that is maintained (e.g., stored) regardless of any failures that can occur in distributed computing systems. Such failures may include message loss, hardware failures, network partitioning, message corruption, message duplication, message alteration, and the like.
Fault-tolerance refers to the concept of enabling systems to continuously function despite failure by or of one or more of the system's components. For instance, in the event of a temporary failure by or of a system component, such as the system's storage device (or components connecting the system to the storage device), the system can continue to function using the replicas stored in the distributed storage devices. In the event of a permanent failure by or of the system's components, including the system's storage device and/or corruption of the data stored therein, the systems' replacement storage device can be replenished using the replicas stored in the distributed storage devices.
In this way, fault-tolerant file replication systems are equipped to partially function even at reduced efficiency, while avoiding a total system failure. Making the data available, via replicas, in order to prevent complete system breakdowns is beneficial in all computing scenarios, but is of particular importance in critical systems such as: fire alarms, emergency dispatching, electricity generation, robotic surgery, nuclear reaction, defibrillators, radiation therapy, infusion pumps, construction equipment, amusement rides, scuba equipment, railway signaling, airbags, braking, power steering, advanced driver assistance, battery management, electric park, air traffic control, flight planning, radio navigation, space flight, and rocket launch.
Traditionally, data replication mechanism rely on replication logs. Replication logs store a list of all changes to a file in a sequential order, assign each change with a consecutive number, and append those changes to a persistent log on each replica. To update a replica, the changes in the persistent log are applied to the replica in the order specified in the persistent log, thereby ensuring that all replicas are executing the same changes in the same order established in the replication log.
One drawback, however, is that traditional replication mechanisms such as those employing replication logs are inefficient because they require changes to replicas to be sequentially written. In other words, a series of updates to parts of a file (e.g., disjunct parts of a file) are processed one after the other. Thus, such replication mechanisms do not exploit the ability of storage devices to concurrently execute a series of operations.
Moreover, traditional replication mechanisms which require writing each change and/or update twice: once to a persistent log corresponding to the replica, and a second time to the replica file itself. Thus, each change and/or update to a file results in two independent input/output (I/O) operations to/from the storage device on which the replica is stored. As a result, storage devices are faced with additional storage and/or processing burdens. Other traditional replication systems merely update the persistent log of the replica and, at a later time, update the replica file itself. This results in systems having to consult the persistent log each time the replica is attempted to be accessed, to ensure that the replica contains the most up-to-date data.
Given the foregoing, it would be beneficial to provide fault-tolerant file replication that allows concurrent processing, consistency to linearizability, and direct modification of replicas. It would also be beneficial to provide fault-tolerant file replication that efficiently updates replicas while minimizing the processing and storage burden on distributed storage devices.