Fault-tolerant systems use computer programs called protocols to ensure that the systems will operate properly even if there are individual processor failures. A fault-tolerant consensus protocol enables each processor or party to propose an action (via a signal) that is required to be coordinated with all other processors in the system. A fault-tolerant consensus protocol has as its purpose the reaching of a “consensus” on a common action (e.g., turning a switch off or on) to be taken by all non-faulty processors and ultimately the system. Consensus protocols are necessary because processors may send signals to only a single other processor at a time and a processor failure can cause two processors to disagree on the signal sent by a third failed processor. In spite of these difficulties, a fault-tolerant consensus protocol ensures that all non-faulty processors agree on a common action.
To reach consensus, consensus protocols first enable each processor or participating network device to propose an action (via a signal) that is later to be coordinated by all the processors or participating network devices in the system. The system then goes through the steps of the consensus protocol. After completing the consensus protocol steps, the common action of the consensus is determined. For example, in a flight-control system, there may be several processors, each equipped with its own sensor, that perform a calculation determining whether the aircraft needs to be moved up or down. In marginal situations, some processors may propose that the craft move up while others propose that it move down. It is important that all non-faulty processors reach consensus on the direction and therefore act in concert in moving the craft.
The problem of consensus in a distributed system in spite of the presence of arbitrary failures was introduced in the context of aircraft control applications in 1978. L. Lamport, M. Pease and R. Shostak later isolated the problem and introduced the name “Byzantine Agreement” within their article “The Byzantine Generals Problem”, ACM Trans. Programming, Languages, Systems, vol. 4, no. 3, pp. 382-401, July 1982.
The “Byzantine Agreement”, also referred to as t-resilient binary Byzantine Agreement where t is the number of tolerable or corrupted participants or adversaries, is specified in the following.                Let π be a protocol for n parties or participating network devices for which each party P1 has a private binary input b1ε{Y, N} or {1, 0} and a transaction TID defining the content about to be decided. It is said that π is a t-resilient binary Byzantine Agreement protocol if the following holds for all t-adversaries and for all inputs:        Validity: If no party is corrupted and all parties start transaction TID with the same input value p ε{Y, N}, then all parties decide p for transaction TID.        Agreement: If one uncorrupted party outputs p for transaction TID, then no uncorrupted party decides and outputs something other than p for the same transaction.        Termination: All honest parties eventually decide.        
M. J. Fischer, N. A. Lynch and M. S. Paterson showed in their article “Impossibility of distributed consensus with one faulty process”, Journal of the ACM, 32(2): 374-382, April 1985, that no deterministic protocol can solve Byzantine Agreement in a fully asynchronous environment in the presence of failures.
Various types of protocols, such as synchronous, asynchronous, hybrid randomized, or deterministic protocols have been proposed whereby a few of them are addressed in the following.
Several synchronous system models have been proposed. The best reaches the deterministic optimum with min {f+2, t+1} rounds, where t is the maximum number of corrupted parties the protocol tolerates and f the number of corruptions that really occur.
As synchrony is a strong assumption, several timing models have been introduced to make the synchrony assumption more realistic. Later protocols isolated the timing assumptions in ‘failure detectors’ to abstract the protocols from the network properties, but an implementation of these failure detectors still requires time-outs. Most failure-detectors work in the crash failure model only, as failure-detectors do not work well with Byzantine corruptions so far.
Concerning asynchronous protocols, the first randomized protocols to solve fully asynchronous Byzantine Agreement where designed by M. Ben-Or and independently by M. O. Rabin and disclosed in their articles “Another advantage of free choice: Completely asynchronous agreement protocol (Extended Abstract)”, in Proceedings of the Second Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, pp. 27-30, Montreal, Canada, Aug. 17-19, 1983 and “Randomized Byzantine generals”, In 24th Annual Symposium on Foundations of Computer Science, pp. 403-409, Tuscon, Ariz., 7-9, Nov. 1983, IEEE.
While Ben-Or's protocol tolerates       [          n      5        ]    -  1corrupted parties, whereby this is called       [          n      5        ]    -  1resilient, with exponential expected running time, Rabin tolerates       [          n      8        ]    -  1corrupted parties with constant expected running time, but requires one previously generated secret value per transaction. Therefore, this protocol needs a trusted dealer after a constant number of transactions that generates new secrets.
In 1984, G. Bracha introduced a protocol for asynchronous broadcast with the article “An asynchronous [(n−1)/3]-resilient consensus protocol”, In Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, pp. 154-162, Vancouver, Canada, 27-29, Aug. 1984. This protocol has become an important primitive for later protocols. However, it requires 3n2 messages for one single broadcast, therefore no protocol using this primitive reaches agreement with less than O(n3) messages. R. Canetti and T. Rabin developed the first protocol with a resilience of       [          n      3        ]    -  1.This has been published under the title “Fast Asynchronous Byzantine agreement with optimal resilience”, In STOC93, pp. 42-51, 1993. Although the number of messages is polynomially bounded, this protocol is impractical, mainly due to the high cost for creating a common coin.
In view of common coins, it can be shown that a real random coin, i.e., without pre-distributed initial values, takes at least as many messages as the Byzantine Agreement itself. D. Beaver and N. So suggested in “Global, unpredictable bit generation without broadcast”, in Tor Helleseth, editor, Advances in Cryptology-EUROCRYPT 93, v. 765 of Lecture Notes in Computer Science, pp. 4254-434, Springer Verlag 1994, 23-27, May 1993, a cryptographic coin-tossing scheme to generate a bounded number of distributed pseudo-random coins. Besides only generating a bounded number of coins before a trusted dealer is needed again, their protocol requires all parties to open their coins in order, i.e., a party can not open coin l+1 if it did not open coin l before, which might cause problems if a party temporarily crashes or performs several transactions simultaneously.
U.S. Pat. No. 4,569,015 describes a method for achieving a multiple processor agreement in a synchronized network optimized for no faults wherein an originating processor broadcasts a value in a message with its unforgeable signature to all n active processors, including itself. Receiving processors in the network pass such a message on with their own unforgeable signatures to all active processors, including themselves. If the number of signatures and phases is the same at each processor after the first two successive passings, then agreement as to the value with no fault is indicated, otherwise if after two passings, t+1 signatures have been collected, then these are signed and sent in the third passing, and in any case, each processor continues the steps of repeatedly sending messages when received, and appending its signature until t+2 passings have occurred. At that time, a processor will agree to the value if at least t+1 signatures append the message, otherwise a default value is adopted, t (n/2) being a reliability measure.
U.S. Pat. No. 5,598,529 discloses a computer system resilient to a wide class of failures within a synchronized network. It includes a consensus protocol, a broadcast protocol and a fault tolerant computer system created by using the two protocols together in combination. The protocols are subject to certain validity conditions. The system in the state of consensus is guaranteed to have all non-faulty processors in agreement as to what action the system should take. The system and protocols can tolerate up to 3t+1 total number of processor failures, but requires as well as the before mentioned method timing guarantees.
P. Bergman and J. A. Garay presented in their article “Randomized Distributed Agreement Revisited”, 23rd Int. Conf. On Fault-Tolerant Computing (FTCS-23), pp. 412-419, 1993, a randomized distributed agreement protocol for asynchronous networks that works for n>5t processors, where n is the size of the network. This protocol belongs to the class of protocols that require a “trusted dealer” continuously.
It is an object of the present invention to create a consensus protocol for an asynchronous network capable of tolerating a maximum of t faulty devices, processors, or parties.
It is a further object of this invention to provide a method to be operable among n processors or parties, where at most t<n/3 processors/links are faulty, and further wherein agreement can be achieved in constant expected time with the number of messages being in the order of the square of n.
It is another object of this invention to provide a cryptographic coin-tossing protocol that allows to create an arbitrary number of distributed unpredictable coins.