This description relates to recovery and fault-tolerance in the presence of computational indeterminism.
Computational systems occasionally fail for a variety of reasons. When such systems fail, data can be lost. It is desirable to take measures to prevent, or at least minimize, such data loss.
Examples of such measures include ACID (Atomic, Consistent, Isolated until committed, Durable when committed) transactions in databases. These known measures are extremely robust. They can be made to meet very high standards of correctness, while also being made fault tolerant.
However, all of this robustness comes at a cost. Known methods for guarding against failure have high latency and sometimes cause extended periods during which the apparatus is unavailable. Thus, they are less than optimal for high-volumes of transactions.
In addition, some known methods require deterministic computation. In deterministic computation, the order in which tasks are performed is fixed, and the result of a computation remains the same each time it is carried out. It is not clear how these known methods can be adapted to efficiently handle non-deterministic computational environments.
Additional complexity arises when a computing apparatus includes multiple processing nodes that cooperate with each other. In such an apparatus, it is possible for one node of the apparatus to fail, and others to keep working. When that failed node recovers, this is no guarantee that it has restored itself to a state that the other nodes expect it to be in.