Peer-to-peer storage is a distributed storage technology with the potential to achieve Internet scale with only modest additional infrastructure investment. Peer-to-peer storage exploits encryption and erasure encoding to securely distribute storage items over a pool of peer storage nodes, accessed via traditional peer-to-peer directory mechanisms such as distributed hash tables (DHTs).
Distributed peer-to-peer storage has the potential to provide essentially limitless, highly reliable, always available storage to the masses of Internet users. Since each participant in the peer storage pool is typically required to contribute storage in proportion to their demand on the pool, it is a self-scaling technique, in contrast to centralized peer-to-peer and storage approaches that demand enormous capital investment and have limited scalability. Encryption is used to secure the data against peer snooping, and erasure encoding is used to store the information with sufficient redundancy for timely retrieval and to prevent ultimate information loss.
Erasure encoding transforms a storage item of n blocks into greater than n blocks such that any sufficiently large subset of blocks is sufficient to reconstitute the storage item. The fraction of blocks required to reconstitute is termed the rate, r. Optimal erasure codes produce n/r blocks with any n blocks sufficient to reconstitute, but these codes are computationally demanding. Near optimal erasure codes require (1+ε)n blocks but reduce computational effort. Rateless erasure codes produce arbitrary numbers of blocks so that encoding redundancy can be adapted to the loss rate of the channel. More specifically, rateless erasure codes can transform an item into a virtually limitless number of blocks, such that some fraction of the blocks is sufficient to recreate the item. Examples of near optimal rateless erasure codes include online codes, LT codes, and Raptor codes.
Erasure codes are typically robust in the face of incomplete retrievals resulting from discontinuous online availability of peer storage nodes. As long as a sufficiently large subset of stored blocks is retrieved, the encrypted storage item can be fully reconstituted and then be decrypted.
In distributed peer-to-peer storage, retrieval probabilities are managed to ensure that requests are honored in a timely manner and that permanent information loss is statistically highly unlikely. Timely retrieval has the potential to be frustrated by the discontinuous online availability of peer nodes, thus requiring a very high degree of redundancy in the erasure encoding (i.e., use of an inefficient low rate code) in order to avoid “information blackouts.”
In order for a peer-to-peer storage system to be universally self-scaling, it must accommodate all significant classes of peer nodes. Some nodes might be always or nearly always online, whereas others might be intermittently online to varying degrees. Both liveness (i.e. probability of a node being online at some time t) and bandwidth, when online will vary over a substantial range when considering the entire Internet client base as a peer storage pool.
What is needed are methods, computer readable media and computer systems for ensuring that requests are honored in a timely manner in a peer-to-peer storage system which is made up of nodes with a wide range of variations in liveness.