It is well appreciated that a computation may consist of concurrently executing tasks distributed over a network of CPUs. Indeed, Hodgkinson, U.S. Pat. No. 4,274,139, entitled "Digital Telecommunications Network Having Improved Data Processing Systems", issued on June 16, 1981, describes a system in which a local CPU executing one task ships another function to another CPU for remote execution, and accepts the output results from the remotely processed task. Likewise, Yost, in copending application Ser. No. 06/459,746, filed on Jan. 21, 1983, entitled "Controlling Multiple Distributed Computations in a Multi CPU Environment from a Single Port", discloses a method for dialoging with distributed concurrently executing tasks of a computation through a single physical port. Among the facilities utilized are those permitting a pass-through, that is, allowing users at one site to log on to a CPU at another, and the transferring of files.
A computation may consist of a number of concurrently executing tasks involving accessing, modifying, and restoring information, either locally or at remotely networked CPUs. This may require coordination, simultaneity of action, or similarity of end effects or results. Examples are diverse such as the debit/credit of distributed accounts by the same amount, or using the same starting clock values. The common attribute of interest is that multiple asynchronous processors use and rely upon an information value originated by one of their number. The quest is to determine whether the value received was the original one sent. Two classes of protocols have been devised to treat this problem. These are respectively, multi-phase commit/abort protocols, and Byzantine Agreements.
Both commit/abort protocols and Byzantine Agreements involve the synchronized phase exchange of messages among the networked CPU's (nodes) and their evaluations at the respective nodes for the purpose of eventually guaranteeing uniform commitment of transactions at all nodes visited by a transaction. There are differences in emphasis among protocol types. For example, Byzantine Agreement assumes that each active node knows the identity of all of the other active nodes in the network, and that there is a direct coupling therebetween. Further, Byzantine Agreement tries to converge agreement within a fixed period of time at a high message overhead. In contrast, multiphase commit protocols, as described by Gray, "Operating Systems, An Advanced Course", Springer Verlag, 1978, focuses on a hierarchy of tasks in which awareness of nodes by any individual node is limited to immediate subordinates.
Multiphase commit protocols can tolerate many lost messages. They cannot, however, tolerate even single instances of failure of the coordinating processor node. This node selectively sends a "commit" to some networked processors and "abort" the transaction to others. In contrast, Byzantine protocols require more messages on the average, and tolerate a limited number of node/link failures, but the failures can be of a variety of types. Thus, when the object of the protocol is to secure guaranteed broadcast, the tradeoff is between message overhead, and reliability of the guarantees.
Also, Skeen, "Non-Blocking Commit Protocols", ACM SIGMOD Conference, 1981, at pp133-142, describes multiphase commit protocols to reduce blocking possibility in the event of failure. Where the computational objective is the concurrent updating of a replicated database, it can be shown that the message traffic between the classes of protocols is of the same magnitude.
Pease et al, "Reaching Agreement in the Presence of Faults", 27 Journal of the ACM, pp228-34, April 1980, defines Byzantine Agreement as a method for achieving consistency and agreement among asynchronous processors in a network having means of exchanging information in synchronized phases, even where some of the processors may be faulty. Relatedly, Lamport, "The Byzantine Generals' Problem", #4 ACM Transaction on Programming Languages and Systems, pp382-401, July 1982, graphically describes the paradigm in military terms. However, as applied to a network of n processors capable of exchanging information over bidirectional links, the problem is for all of the processors to agree on the contents (value) of a message being sent by one of them in an environment containing potentially faulty processors or links. Under these circumstances, no assumption is made about the behavior of faulty components. Thus any method must cope with processors or links that could fail to relay a message as intended, or even sabotage its contents.
Byzantine Agreement (BA) is a result of a guaranteed broadcast to a set of participating processors, such that all correctly operating processors receive the same message or none does, provided that the number of faults in the system does not exceed a parameter t, which parameter characterizes the reliability of the guaranty. Characteristically, the network of possibly unreliable processors includes means for conducting several synchronized phases of information exchange. After the exchange, the processors must all agree on a value held originally by one of their number. BA is achieved when:
1. All correct processors agree on the same value, and
2. If the originator is correct, then all processors agree on its value.
Implicit in attributes (1) and (2), is that there must exist a time by which each of the processors has completed the execution of its protocol for reaching agreement, and that this time must be known by all the processors. The agreement is said to be "immediate" if all processors reach agreement in the same phase. Otherwise, the agreement is "eventual".
A protocol uses "authentication" when it is desired to prevent any processor from introducing any new value or message into the information exchange while claiming to have received it from another. Authentication protocols require a sending processor to append a signature to the message. The "signature" contains a sample portion of the message encoded such that any receiving processor can verify that the message is authentic, and that it was created by the originating processor. Also, it is assumed that no processor can forge the signature of another, that is, no processor can change the content of a message undetectably.
If a Byzantine Agreement with reliability t is impacted by fewer than t faults, how soon can the processors reach an agreement in which only f actual faults occur? The resolution of this question involves making explicit the notion of stopping. The processor is considered to have "stopped" when it has decided upon a value for agreement, and will do no further processing or relaying of messages pertaining to this agreement. When a processor has stopped with respect to a particular agreement, it conceivably could cut all of its communication links without effecting the outcome of the agreement for other processors or itself.
In the prior art, methods for achieving "immediate Byzantine Agreement" using authentication, require t+1 phases and 0(nt) messages. It will be assumed that it is art-recognized that there exist protocols requiring authentications in which the number of messages in an n processor network for achieving agreement is a small polynomial function involving the factors n and t. It is also stipulated that without authentication the results are more varied.