In distributed storage systems, for example a RAID system, a file is dispersed on a plurality of servers, in case of a RAID system on a plurality of hard discs. The file is dispersed in such a way that for example in a RAID system when a hard disc fails or more general is simply unavailable, the number of dispersed file fragments on the remaining hard discs is large enough to restore or reconstruct the dispersed file from the file parts stored on the remaining operating hard discs.
Unavailability of entities like servers in distributed computing systems or for example hard discs in a RAID system can be distinguished into byzantine failures and crashes. Byzantine failures are arbitrary faults occurring for example during an execution of an algorithm by the distributed system. When a byzantine failure has occurred the distributed system may respond in an unpredictable way. Byzantine failures may e.g. arise from malware or hackers that attack storage servers or from manufacturer faults.
The other type of failure is a crash leading to unavailability at least temporarily. A crash may also be a intended shutdown of a server, for example for maintenance reasons.
However unavailability of entities in distributed systems occurs only occasionally. Such a worst case scenario would include unpredictable message delays, for example due to a network partition or a swamped server. In most cases the distributed system is functioning: The communication is synchronous and messages are delivered within respected time bounds in the distributed system Further a distributed computing system is conventionally configured to tolerate a large number of server failures although the occurrence of actual failures is rather low.
Conventional storage protocols like byzantine storage protocols described in James Hendricks, Gregory R. Ganger, and Michael K. Reiter. 2007, “Low-overhead byzantine fault-tolerant storage”, in Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (SOSP '07) proposes handling of worst case scenarios. One of the disadvantages is however that a large overhead communication with respect to the information exchanged is necessary leading to a high blow-up factor A further disadvantage is that the proposed methods therein are inflexible relating only to byzantine failures of servers.