The present invention relates to a network whose processor nodes exchange information in an asynchronous fashion, and more particularly to a method for achieving agreement among the processors, even in the presence of undetected faulty processors. Thus, it is applicable in a wide range of distributed computation systems, reaching from fault-tolerant database systems to intrusion tolerant e-commerce.
Fault-tolerant systems use computer programs called protocols to ensure that the systems will operate properly even if there are individual processor failures. A fault-tolerant consensus protocol enables each processor or party to propose an action (via a signal) that is required to be coordinated with all other processors in the system. A fault-tolerant consensus protocol has as its purpose the reaching of a xe2x80x9cconsensusxe2x80x9d on a common action (e.g., turning a switch off or on) to be taken by all non-faulty processors and ultimately the system. Consensus protocols are necessary because processors may send signals to only a single other processor at a time and a processor failure can cause two processors to disagree on the signal sent by a third failed processor. In spite of these difficulties, a fault-tolerant consensus protocol ensures that all non-faulty processors agree on a common action and that this action is one proposed by a non-faulty processor.
To reach consensus, consensus protocols first enable each processor or participating network device to propose an action (via a signal) that is later to be coordinated by all the processors or participating network devices in the system. The system then goes through the steps of the consensus protocol. After completing the consensus protocol steps, the common action of the consensus is determined. For example, in a flight-control system, there may be several processors, each equipped with its own sensor, that perform a calculation determining whether the aircraft needs to be moved up or down. In marginal situations, some processors may propose that the craft move up while others propose that it move down it is important that all non-faulty processors reach consensus on the direction and therefore act in concert in moving the craft.
The problem of consensus in a distributed system in spite of the presence of arbitrary failures was introduced in the context of aircraft control applications in 1978. L. Lamport, M. Pease and R. Shostak later isolated the problem and introduced the name xe2x80x9cByzantine Agreementxe2x80x9d within their article xe2x80x9cThe Byzantine Generals Problemxe2x80x9d, ACM Trans. Programming, Languages, Systems, vol. 4, no. 3, pp. 382-401, July 1982.
The xe2x80x9cByzantine Agreementxe2x80x9d, also referred to as t-resilient binary Byzantine Agreement where t is the number of tolerable or corrupted participants or adversaries, is specified in the following:
Let xcfx80 be a protocol for n parties for which each party Pi has a private input bi xcex5{0, 1 }* It is said that xcfx80 is a t-resilient Byzantine Agreement protocol if the following holds for all t-adversaries and for all inputs:
Validity: If no party is corrupted and all parties start transaction TID with a same input value then all parties decide xcfx81 for transaction TID.
Agreement: If one uncorrupted party outputs xcfx81 for transaction TID, then no uncorrupted party decides and outputs something other than xcfx81 for the same transaction.
Termination: For every transaction TID that has been started by all uncorrupted parties, all uncorrupted parties eventually decide.
M. J. Fischer, N. A. Lynch and M. S. Paterson showed in their article xe2x80x9cImpossibility of distributed consensus with one faulty processxe2x80x9d, Journal of the ACM, 32(2): 374-382, April 1985, that no deterministic protocol can solve Byzantine Agreement in a fully asynchronous environment in the presence of failures.
Various types of protocols, such as synchronous, asynchronous, hybrid randomized, or deterministic protocols have been proposed whereby a few of them are addressed in the following.
Several synchronous system models have been proposed. The best reaches the deterministic optimum with min {f+2, t +1}rounds, where t is the maximum number of corrupted parties the protocol tolerates and f the number of corruptions that really occur.
As synchrony is a strong assumption, several timing models have been introduced to make the synchrony assumption more realistic. Later protocols isolated the timing assumptions in xe2x80x98failure detectorsxe2x80x99 to abstract the protocols from the network properties, but an implementation of these failure detectors still requires time-outs. Most failure-detectors work in the crash failure model only, as failure-detectors do not work well with Byzantine corruptions so far.
Concerning asynchronous protocols, the first randomized protocols to solve fully asynchronous Byzantine Agreement where designed by M. Ben-Or and independently by M. O. Rabin and disclosed in their articles xe2x80x9cAnother advantage of free choice: Completely asynchronous agreement protocol (Extended Abstract)xe2x80x9d, in Proceedings of the Second Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, pp. 27-30, Montreal, Canada, 17-19 Aug. 1983 and xe2x80x9cRandomized Byzantine generalsxe2x80x9d, In 24th Annual Symposium on Foundations of Computer Science, pp. 403-409, Tuscon, Ariz., 7-9 Nov. 1983, IEEE.
While Ben-Or""s protocol tolerates       [          n      5        ]    -  1
corrupted parties, whereby this is called       [          n      5        ]    -  1
resilient, with exponential expected running time, Rabin tolerates       [          n      8        ]    -  1
corrupted parties with constant expected running time, but requires one previously generated secret value per transaction. Therefore, this protocol needs a trusted dealer after a constant number of transactions that generates new secrets.
In 1984, G. Bracha introduced a protocol for asynchronous broadcast with the article xe2x80x9cAn asynchronous [(nxe2x88x921)/3]-resilient consensus protocolxe2x80x9d, in Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, pp. 154-162, Vancouver, Canada, 27-29 Aug. 1984. This protocol has become an important primitive for later protocols. However, it requires 3n2 messages for one single broadcast, therefore no protocol using this primitive reaches agreement with less than O(n3) messages. R. Canetti and T. Rabin developed the first protocol with a resilience of       [          n      3        ]    -  1.
This has been published under the title xe2x80x9cFast asynchronous byzantine agreement with optimal resiliencexe2x80x9d, In STOC93, pp. 42-51, 1993. Although the number of messages is polynomially bounded, this protocol is impractical, mainly due to the high cost for creating a common coin.
U.S. Pat. No. 4,569,015 describes a method for achieving a multiple processor agreement optimized for no faults wherein an originating processor broadcasts a value in a message with its unforgeable signature to all n active processors, including itself Receiving processors in the network pass such a message on with their own unforgeable signatures to all active processors, including themselves. If the number of signatures and phases is the same at each processor after the first two successive passings, then agreement as to the value with no fault is indicated, otherwise if after two passings, t+1 signatures have been collected, then these are signed and sent in the third passing, and in any case, each processor continues the steps of repeatedly sending messages when received, and appending its signature until t+2 passings have occurred. At that time, a processor will agree to the value if at least t+1 signatures append the message, otherwise a default value is adopted, t (n/2) being a reliability measure.
U.S. Pat. No. 5,598,529 discloses a computer system resilient to a wide class of failures. It includes a consensus protocol, a broadcast protocol and a fault tolerant computer system created by using the two protocols together in combination. The protocols are subject to certain validity conditions. The system in the state of consensus is guaranteed to have all non-faulty processors in agreement as to what action the system should take. The system and protocols can tolerate up to 3t+1 total number of processor failures.
Pedone and A. Schiper discuss optimistic consensus in their article xe2x80x9cOptimistic atomic broadcastxe2x80x9d, in proceedings of the 12th international symposium on distributed computing (DISC 98), September 1998. However, their approach can deal with crash failures only, and it requires a failure detector in the pessimistic case as well. Furthermore, the protocol requires a reliable broadcast primitive in the optimistic case, which makes it less efficient.
It is, therefore, an object of the present invention to create a consensus protocol for a potentially asynchronous network capable of tolerating a maximum of t faulty devices, processors, or parties.
It is a further object of this invention to provide a method to be operable among n processors or parties, where at most t less than n/3 processors/links are faulty, and further wherein agreement can be achieved in constant expected time with the number of messages being in the order of the square of n.
The following is an informal definition to aid in the understanding of the description.
Hybrid Failures
The method for achieving Byzantine Agreement can distinguish between several different ways in which a network device can fail. This could for example be:
Byzantine Failures BF: If a byzantine failure BF occurs, the adversary has taken full control over the corresponding machine. All secrets this machine has are handed over to the adversary, who now controls its entire behavior.
Crash Failures CF: A crash failure CF simply means that the corresponding machine stops working. This could happen anytime, i.e., even in the middle of a broadcast or while sending a message. It is assumed that there is no mechanism other parties can reliably detect such a crash.
Link Failures LF: A link failure LF occurs when not a party, but an interconnecting link becomes faulty. As the link has no access to authentication keys, it is easy to prevent it from modifying or inserting messages. A faulty link could however delete messages, and it might completely disconnect two parties.
Adversary structure An adversary structure T is a set of sets (coalitions) of parties whose corruption the system should be tolerated. Let M be the set of all participating network devices. An adversary structure is called
Q2, if no two coalitions N1, N2∈T satisfy N1∪N2 =M.
Q3, if no three coalitions N1, N2, N3∈T satisfy N1∪N2∪N3 =M.
Q2+3 with respect to CF and BF, if for all c1, c2∈CF and all b1, b2, b3∈BF,
M {b1∪b2∪b3∪1∪c2}⊃Ø;
A Q2 adversary structure is sufficient to solve byzantine agreement if only crash failures CF occur. Q3 is applied in the byzantine case, where only byzantine failures BF occur, while Q2+3 is the generalization for the hybrid crash-byzantine failure case.
Threshold signature A k out of l threshold signature scheme is a protocol that allows any subset of k players or parties out of l to generate a signature, but that disallows the creation of a valid signature if fewer than k players participate the protocol. This non-forgeability property should hold even if some subset of less than k players are corrupted and work together. Furthermore, the threshold signature scheme should also be robust, meaning that corrupted players should not be able to prevent uncorrupted players from generating signatures. The threshold signature can be applied to the adversary structure model, whereby k and l are replaced by appropriate sets.
The foregoing and other objects are realized by the present invention which devises a machine-implementable method for achieving Byzantine Agreement among processors or parties connected by a partially asynchronous network. Partially asynchronous network in that sense means that the network can work either in a synchronous or an asynchronous mode, depending on the circumstances and the given assumptions. The synchronous mode where no adversaries are present is also referred to as the optimistic case whereas the asynchronous mode where adversaries are allowed is referred to as the pessimistic case. The present method for achieving Byzantine Agreement turns out to be practical and also theoretically nearly optimal in the sense that it withstands the maximum number of corrupted parties, runs in a constant number of rounds, uses a nearly optimal number of messages, and the total bit length of these messages is also nearly optimal. Moreover, in conjunction with any, e.g., less efficient, consensus protocol, the present method reaches optimal performance if the behavior is acceptable, i.e., some timing assumptions hold and all parties are honest, without adding security constraints or significant performance loss in the pessimistic case. In the optimistic case, no cryptography is required at all; therefore, the computational complexity is minimal.
The objects of the invention are achieved by the features stated in the enclosed independent claims. Further advantageous implementations and embodiments of the invention are set forth in the respective subclaims.
In general, the objects are attained by i) an optimistic pre-protocol that achieves agreement in case the network satisfies some synchrony assumptions and no party is corrupted, ii) a verification protocol that finds out if agreement has been reached, and iii) a pessimistic fallback protocol that uses standard techniques to reach agreement in case the optimistic pre-protocol failed.
The pre-protocol preserves properties of the fallback protocol, as for example the resiliency. Especially, if the fallback protocol has more than two possible agreement values, i.e., Multivalued Agreement, then so does the optimistic protocol.
Deciding is atomic and final; a decision may neither be changed nor extended. It is guaranteed that if some parties decide in the optimistic pre-protocol, while others decide in the pessimistic fallback protocol, the corresponding agreement values are equal.
This method results not only in the maximal number of tolerable traitors, t less than n/3, but also in an optimal number of messages in the optimistic case. As to other methods in the prior art, none could offer a combination of a synchronous and an asynchronous protocol to combine the robustness of the asynchronous protocol with the efficiency of the synchronous one.
It shows advantageous if a transaction identifier HD can be used, because then each party runs several instances of the protocol simultaneously, which means that several agreements can be performed in parallel.
It is possible that a party Pi activates one instance of the protocol by receiving a message containing both the transaction-identifier TID and an initial value bi.
It is advantageous if an initial value in the fallback protocol is only accepted if the participating network device proves by including the signature that its initial value belongs to the simple majority or is obtained from a transition function, because this allows a simpler asynchronous fallback agreement protocol. Such a transition function, for example, outputs a certain result whenever that is possible.
When using a part-protocol based on leader election, then the advantage occurs that different ways of generating the initial values can be used.
When the network is a partially synchronous network, then the advantage occurs that a failure detector is easier implementable.
Synchrony assumptions or timing assumptions are applicable, whereby the protocol can be fine-tuned for the network properties to increase efficiency.
It is an advantage if threshold signatures are applied, because the size of the used messages can be reduced.
A suitable threshold signature scheme has been provided by V. Shoup and published in the article xe2x80x9cPractical threshold signaturesxe2x80x9d, in Technical Report RZ 3121, IBM Zurich Research Laboratory, April 1999. This article is incorporated herein by means of reference.