In recent years, a number of distributed storage system have been developed, commonly in the context of peer-to-peer (P2P) systems. The first generation of such P2P systems, e.g., Napster, Gnutella, etc., were “read-only” systems suitable for file sharing. These first generation systems typically placed less emphasis on availability and reliability of data and more emphasis on connectivity and name management (i.e., organization of files in directories, search mechanisms, etc.).
Modern P2P storage systems have evolved to provide solutions to a variety of storage problems. For example, recent approaches to P2P storage systems provide more extensive security, file sharing, and archive capabilities. Modern P2P storage systems are typically classified as archival-only systems or continuous-update systems. Archival-only systems (e.g., Venti, Freenet, etc.) assume that each object (e.g., piece of data, group of data, etc.) stored by the P2P storage system is unique and not dependent on any other object stored in the system. The archival-only systems typically provide mechanisms to reliably and securely store and retrieve objects. Further, archival-only systems are typically designed to work properly under disrupted or intermittent connectivity as long as unique object names can be created, new objects can be created and stored, and all objects can be retrieved subject to the connectivity constraints.
On the other hand, continuous-update systems (e.g., Oceanstore, FarSite, etc.) provide the ability to handle shared write operations and to maintain some relation between stored objects, in addition to the functionalities provided by archival-only systems. Further, continuous-update systems typically assume that data (in the form of objects) stored in the system may be updateable. Continuous-update systems typically maintain “the latest version” of an object. In this case, because object updates may originate simultaneously from multiple sources, a continuous-update system includes functionality to implement a serializer function. The serializer function is responsible for enforcing strict ordering of updates and may be implemented in many ways (i.e., centralized serializer, distributed serializer, etc.). Typically, continuous-update systems that support shared writes require additional mechanisms to maintain data integrity under intermittent connectivity.
Both archival-only and continuous-update systems rely on a distributed object location and retrieval (DOLR) mechanism (e.g., Tapestry [see, for example, Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing (2001)], Chord, etc.). Systems which use DOLR implement an overlay network on top of the basic Internet Protocol network, and use a naming and routing scheme, typically distributed hash tables (DHT), to communicate. Systems employing the DOLR mechanism are typically very sensitive to the availability and connectivity of the underlying communication infrastructure. In some cases, failures of critical devices (e.g., nodes, the serializer, etc.) and physical storage that is occasionally disconnected from the network may not allow systems using the DOLR mechanism to operate correctly (i.e., provide access to objects).