The present invention relates, in general, to fault tolerance in distributed computing systems, for example, grid computing systems, and, in particular, to controlling the quality of results returned from parallel computational processing tasks in distributed computer networks.
In distributed computing systems, a computing task may be distributed over a network to be performed by a set of hosts, so that a result will more quickly or reliably be returned. Voting or quorum based systems are used to improve the confidence that a correct result has been returned. In general, the concept of Byzantine fault-tolerance describes the ability of a system to defend against some number of Byzantine failures, in which components may act in ways that are erroneous and inconsistent, and in which any results they return may be affected by errors and inconsistencies. Essentially, Byzantine fault-tolerance requires systems to apply statistical methods to the problem of determining how many “votes” for a particular returned result (“a quorum”) from a set of result-returning systems (a “processing set”) will provide confidence in that result, and thus how many erroneous or misleading results can be disregarded by the system in establishing a single correct result of a computation.