Instant access to electronic documents and data becomes increasingly more critical for day-to-day business operations. As a result, storage needs to be reliable and resilient to failures, including localized physical damage. Distributed, replicated storage over a computer network seems the only way out.
Unfortunately, today's distributed/replicated systems either require full, identical replication between the computing entities involved which typically are at least two data centers in different locations, or require in the case of distributed storage, a centralized controller keeping track of the replica distribution. Anyone who is to access more than one replica needs to either know the full list of replicas or needs to have access to a directory service which returns this information, either globally—for all documents—or on a per-document basis.
Distributed storage becomes increasingly more important, as existing inexpensive machines can be used to serve content. With the advent of distributed hash table (DHT) technology, self-organizing storage networks have become feasible and have raised significant interest in the community. Sitting “on top” of the Internet, these scalable overlay networks use the transport capabilities of the underlying network, but add value. DHT technology provides a mapping from resource IDs to hosts (D-->H) that is typically preceded by a mapping from resource name to resource ID (N-->D). This is achieved using minimal routing information in each node. DHTs generally are also prepared to deal with changes in host availability and network connectivity.
DHTs come in a variety of routing flavors, but share the properties that messages are transported on a hop-by-hop basis among constituent nodes of the overlay network. Each hop knows how to get closer to the destination, until it finally reaches the node that claims the requested ID as its own and acts according to the request.
Some of the DHTs operate based on intervals ring topologies, such as described in “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications”, Ion Stoica et al., Proceedings of ACM SIGCOMM 2001, August 2001, pages 149-160, some split hyperspaces into manageable chunks, as described in “A Scalable Content-Addressable Network”, Sylvia Ratnasamy et al., Proceedings of ACM SIGCOMM, September 2001, or “Efficient Topology-Aware Overlay Network”, Marcel Waldvogel and Roberto Rinaldi, ACM Computer Communications Review, January 2003, Volume 33, Number 1, pages 101-106, whereas others implement a rootless tree, such as described in “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”, Anthony Rowstron and Peter Druschel, IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), November 2001, pages 329-350, or “Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing”, Ben Y. Zhao et al., University of California, Berkeley, UCB/CSD-01-1141, April 2001.
Many of these DHT systems are able to exploit the locality of the underlay network. Locality aspects are typically separated into geographic layout and proximity forwarding categories adapted from “Exploiting Network Proximity in Distributed Hash Tables”, Miguel Castro et al., International Workshop on Future Directions in Distributed Computing (FuDiCo), edited by Ozalp Babaoglu and Ken Birman and Keith Marzullo, June 2002, pages 52-55.
“Accessing Nearby Copies of Replicated Objects in a Distributed Environment”, C. Greg Plaxton et al., ACM Symposium on Parallel Algorithms and Architectures”, 1997, pages 311-320, shows another approach to locality patterns.
An approach of linking DHTs and caching is shown in “OceanStore: An Architecture for Global-Scale Persistent Storage”, John Kubiatowicz et al. Proceedings of ACM ASPLOS, November 2000. There, queries passing along the DHT are redirected by Attenuated Bloom Filters (ABF), when there is a high probability that a document cache can be found along that route. Besides the chances for false positives despite continuous ABF update traffic, there is no way for the document originator to address selected replicas when the need arises.
“INS/Twine: A Scalable Peer-to-Peer Architecture for Intentional Resource Discovery”, Magdalena Balazinska et al., Pervasive 2002—International Conference on Pervasive Computing, August 2002, shows an example of a resource discovery/directory service on top of a DHT.
US20020114341A1 presents a peer-to-peer enterprise storage, which uses a centralized controller/coordinator.
Applicant's U.S. Pat. No. 6,223,206 discloses a method and system for load balancing by replicating a portion of a file being read by a first stream onto a second device and reading the portion with a second stream capable of accessing. This prior art deals with a completely centralized system.
US20030014523A1, US20030014433A1, and US20030014432A1, each introduces a storage network data replicator. There are algorithms disclosed on how to replicate from one instance to the other. It is described which existing replica to select as a source for further replication.
U.S. Pat. No. 6,467,046, and EP 807 885 B1 both show a system and a method for automatically distributing copies of a replicated database in a computer system. Hosts and disks for determining replica placement are enumerated in order to improve reliability.
U.S. Pat. No. 5,815,649 illustrates a distributed fault tolerant digital data storage subsystem for fault tolerant computer system. Multiple redundant computers are used as a front-end to multiple redundant disks, basically as a network RAID (Redundant Array of Inexpensive Disks).
According to U.S. Pat. No. 6,470,420, a method is proposed for designating one of a plurality of addressable storage devices to process a data transfer request. A client multicasts a single request to all replicas and they cooperatively select the one to reply.
WO 03/012699 A1 shows systems and methods for providing metadata for tracking of information on a distributed file system of storage devices. Metadata are used to locate the files
In U.S. Pat. No. 6,163,856 a method and an apparatus are shown for file system disaster recovery.
According to another applicant's patent U.S. Pat. No. 5,897,661, there is illustrated a logical volume manager and a corresponding method for having enhanced update capability with dynamic allocation of storage and minimal storage of metadata information. Metadata replication is provided, which is limited to those storage providers who have a need to know.
In WO 02/093298 A3, a modular storage server architecture is described with dynamic data management. This document shows replication according to locality access patterns and hierarchical storage management.
According to US20030028695A1, a producer/consumer locking system for efficient replication of file data is shown which provides locking between concurrent operations.
According to U.S. Pat. No. 5,588,147, a replication facility is described, which uses a log file mechanism to replicate documents.
Despite the work done on replication and distributed storage, there currently is a lack of replication mechanism on top of the completely distributed technology which does not suffer from the presence of single points of failure. As replicas not only improve availability, but may also balance load, there have been distributed mechanisms also for the purpose of caching. Besides reliability, caching systems also pose an update problem: As it is not clear, where information is cached, a cache may become stale, if it does not continuously track the status of the original location. This poses a severe challenge in scalability, undoing the off-loading, caching provides.
Hence, it is desirable to provide a mechanism for managing replicas in a computer network, which mechanism is reflected in appropriate methods, computing entities and computer program elements for retrieving and/or depositing replicas in a computer network.