Distributed processing has many different forms and embodies many different techniques depending on the nature of the data and the objectives of a given application. Typical objectives include: transaction performance, locality of data, minimization of network traffic, high availability, minimization of storage requirements, extensibility, cost to produce, etc. Many applications have certain objectives of overriding significance which dictate the distributed processing technique employed.
There is one type of application for which the overriding concern is high availability (resiliency). This type includes switching systems used in a communications network, as well as control systems in the areas of avionics, industrial control, and stock trading. In these systems, it is assumed that any component may fail and the system must continue to run in the event of such failure. These systems are designed to be "fault-tolerant" usually by sacrificing many other objectives (e.g. cost, performance, flexibility and storage) in order to achieve high availability.
Traditionally, fault-tolerant systems have been built as tightly coupled systems with specialized hardware and software components all directed toward achieving high availability. It would thus be desirable to provide a more generally applicable technique for achieving high availability without reliance on specialized components.
Another important factor for a distributed application is the need to know the relative state of a plurality of peer processes before certain actions occur. Such coordination of effort requires that each process know not only what it thinks about the state of the other processes, but also what the other processes think about its state. This is called a "relativistic" view of process state.
For example, assume a system has been developed that uses four cooperating processes to perform a task. Also assume that all four processes must be active before the coordinated task can begin; an active process is defined as a process that has successfully contacted its peers. On system start-up, each process must contact all other processes. After contacting its peers, each process must wait until all other processes have contacted their peers. Thus, only after all processes have contacted their peers, and all the peers know of this contact status, can the task begin.
This is actually a very common situation when attempting to coordinate a task among several processes. It would be desirable to provide a mechanism for gathering a relativistic view of state among various processes.