Replication, known as xe2x80x9cmirroringxe2x80x9d in Internet parlance, is a technique that is used to address a scalability problem of popular Internet sites. As a popular site experiences a high rate of requests for objects stored at the site, the site can become overburdened and slow to respond, or even crash. As used herein, the term xe2x80x9cobjectxe2x80x9d refers to a piece of information. An object is embodied as a xe2x80x9creplica,xe2x80x9d e.g., a file that is stored at a host, or a program (e.g., executable) and associated files that produce a piece of information. Examples of replicas include a page at a web site, a graphic file, an audio file, a cgi-bin file, etc. A request for an object is answered by sending a copy of a replica to the requester.
To solve the scalability problem, replicas of the requested objects can be stored at several locations throughout the network, thereby spreading the load of sending copies of the replicas of requested objects to requesting users.
It is important to properly decide where to store the replicas, and how to allocate requests for objects among the sites at which the replicas are stored. Often, these two problems are related in that a placement strategy will have important implications for the request allocation strategy, and vice versa.
Certain known replication (mirroring) techniques are implemented manually by system administrators, who monitor the demand for information on their sites and decide what data should be replicated and where the replicas should be stored. This task becomes daunting when the number of objects that can be requested and possible storage sites for replicas of such objects become large. Such a situation can arise, for example, in networks that are used to provide hosting services. Generally, a hosting service maintains and providing access to objects belonging to third-party information providers. For example, a hosting service may provide the infrastructure numerous web sites whose content is provided by third parties.
As the scale of a hosting system increases (i.e., as the number of objects and hosting servers on which replicas of the objects are stored becomes larger), the decision space for replica placement increases. A brute-force, worst case design becomes prohibitively expensive, and the problem of mirroring becomes too large and complex to be effectively handled manually by system administrators. Without appropriate new technology, system administration related to replica placement may become a factor limiting the scale to which hosting platforms may efficiently increase. This new technology must be able to automatically and dynamically replicate Internet objects in response to changing demand.
Some known protocols allocate requests among hosts that store mirrored objects by collecting load reports from the hosts and weighing host loads into a network-topology-based request distribution scheme. This approach, implemented in the Local Director made by CISCO Systems of California, is not well suited for dynamic replication on a global scale. This is because the request re-direction subsystem is highly distributed, forcing each host to send its load report to a large number of redirecting servers. This disadvantageously increases network traffic and can function poorly if the load reports are delayed in reaching all of the request redirectors. Further, request distribution for a given object becomes dependent on the popularity of many other objects that are co-located at the same host. This renders request distribution effectively non-deterministic and unpredictable, greatly complicating autonomous replica placement decisions.
Other known commercial products offer transparent load balancing among multiple Internet sites. See CISCO Distributed Director White Paper,  less than http://www.cisco.com/warp/public/734/distdir/dd_wp.htm greater than ; IBM Interactive Network Dispatcher,  less than htttp://www.ics.raleigh.ibm.com/netdispatch/ greater than ; Web Challenger White paper, WindDance Network Corporation,  less than http://www.winddancenet.com/newhitepaper.html greater than , 1997. These products differ in the network level where the redirection of requests to physical replicas occur: CISCO""s Distributed Director performs re-direction at the DNS level. A similar idea is used in E. Katz, M. Butler, and R. McGrath, A Scalable Web Server: The NCSA Prototype, Computer Networks and ISDN Systems, 27, pp. 155-164, September 1994, May 1994. The IBM Net Dispatcher and CISCO""s Local Director redirect requests at the front-end router level, while Winddance""s Web Challenger does so at the application level using redirection features of the HyperText Transfer Protocol (HTTP). None of these products offer dynamic replication or migration of replicas.
Existing protocols for performance-motivated dynamic replication rely on assumptions that are unrealistic in the Internet context. Wolfson et al propose a ADR protocol that dynamically replicates objects to minimize communication costs due to reads and writes. O. Wolfson, A. Jajodia, and Y. Huang, An Adaptive Data Replication Algorithm, ACM Transactions on Database Systems (TODS), Vol. 22(4), June 1997, pp. 255-314. Most Internet objects are rarely written. Recent trace studies (e.g., S. Manly and M. Seltzer, Web Facts and Fantasy, in USENIX Symp. on Internet Technologies and Systems, pp. 125-134, 1997) consistently show that 90% of requests are to static objects, and many of the remaining objects are dynamically generated responses to read-only queries. Therefore, minimizing communication costs due to reads and writes is not a suitable cost metric for the Internet. In addition, the Wolfson protocol imposes logical tree structures on hosting servers and requires that requests travel along the edges of these trees. Because of a mismatch between the logical and physical topology of the Internet, and especially because each node on the way must interpret the request to collect statistics (which requires in practice a separate TCP connection between each pair of nodes), this would result in impractically high delays in request propagation.
Heddaya and Mirdad""s WebWave dynamic replication protocol was proposed specifically for the World Wide Web on the Internet. A. Heddaya and S. Mirdad, WebWave: Globally Load Balanced Fully Distributed Caching of Hot Published Documents, in Proc. 17th IEEE Intl. Conf. on Distributed Computing Systems, May 1997. However, it burdens the Internet routers with the task of maintaining replica locations for Web objects and intercepting and interpreting requests for Web objects. It also assumes that each request arrives in a single packet. As the authors note, this protocol cannot be deployed in today""s networks.
Algorithmically, both ADR and WebWave decide on replica placement based on the assumption that requests are always serviced by the closest replica. Therefore, neither protocol allows load sharing when a server is overloaded with requests from its local geographical area. Objects are replicated only between neighbor servers, which would result in high delays and overheads for creating distant replicas, a common case for mirroring on the Internet. Also, ADR requires replica sets to be contiguous, making it expensive to maintain replicas in distant corners of a global network even if internal replicas maintain only control information.
The works of Bestavros (A. Bestavros, Demand-based Document Dissemination to Reduce Traffic and Balance Load in Distributed Information Systems, in Proc. of the IEEE Symp. on Parallel and Distr. Processing, pp. 338-345, 1995) and Bestavros and Cunha (A. Bestavros and C. Cunha, Server-initiated Document Dissemination for the WWW, Bulletin of the Computer Society technical Committee on Data Engineering, pp. 3-11. Vol. 19, No. 3, September 1996) appear to be the predecessors of WebWave. A. Bestavros, Demand-based Document Dissemination to Reduce Traffic and Balance Load in Distributed Information Systems, in Proc. of the IEEE Symp. on Parallel and Distr. Processing, pp. 338-345, 1995 proposes to reduce network traffic within an intranet by caching organization""s popular objects close to the intranet""s entry point. In a very large scale system, there would be many such entry points. Such a system would address the problems of choosing entry points at which to place object replicas and allocating requests to those replicas. These questions are not considered in A. Bestavros, Demand-based Document Dissemination to Reduce Traffic and Balance Load in Distributed Information Systems, in Proc. of the IEEE Symp. on Parallel and Distr. Processing, pp. 338-345, 1995. In A. Bestavros and C. Cunha, Server-Initiated Document Dissemination for the WWW, Bulletin of the Computer Society Technical Committee on Data Engineering, pp. 3-11, Vol. 19, No. 3, September 1996, Bestavros and Cunha discuss the benefits of replicating popular objects from the host server up the request tree, but no methods for doing so are described.
Baentsch et al (M. Baentsch, L. Baum, G. Molter. S. Rothkugel, and P. Sturm. Enhancing the Web""s Infrastructure: From Caching to Replication, IEEE Internet Computing, Vol 1, No. 2, pp. 18-27, March/April, 1997) propose an infrastructure for performing replication on the Web, without describing methods for deciding on replica sets. Also, the infrastructure assumes gradual learning of the replica set by clients, which may hurt the responsiveness of the system. Gwertzman and Seltzer (J. Gwertzman and M. Seltzer. The Case for Geographical Push-Caching, Proc. Of the HotOS Workshop, 1994. Also available at  less than ftp://das-ftp.harvard.edu/techreports/tr-34-94.ps.gz greater than  motivate the need for geographical proximity-based object replication. They propose to base replication decisions on the geographical distance (in miles) between clients and servers. This measure may not correctly reflect communication costs for fetching an object, since the network topology often does not correspond to the geographical distances.
The problem of placing objects in the proximity of requesting clients has also been addressed in research on file allocation (see xcfx86 Kure, Optimization of File Migation in Distributed Systems, Ph.D Dissertation, University of California (Berkeley), 1988. Also available as Technical Report UCB/CSD 88/413, Computer Science Division (ECCS), University of California (Berkeley), April 1988 for an early survey; and B. Awerbuch, Y. Bartal, and A. Fiat. Competitive Distributed File Allocation, In Proc. Of the 25th ACM Symposium on Theory of Computing, pp. 39-50, 1992; B. Awerbuch, Y. Bartal, and A. Fiat, Distributed Paging for General Networks, In Proc. of the 7th ACM-SIAM Symposium on Discrete Algorthms, pp. 574-583, January, 1996; and Y. Bartal, A. Fiat, and Y. Rabani, Competitive Algorithms for Distributed Data Management, in Proc. Of the 24th ACM Symposium on Theory of Computing, pp. 39-50, 1992 for more recent work). Early work in this area assumes a central point where decisions on object placement are made by solving an integer programming optimization problem. Even when the search space is heuristically pruned, the scale of our application would make such approaches impractical. Also, this approach requires the decision-making point to have complete information on network topology, server loads, and demand patterns.
More recently, the problem of obtaining distributed solutions for file allocation has been addressed. See B. Awerbuch, Y. Bartal, and A. Fiat, Competitive Distributed File Allocation, in Proc. Of the 25th ACM Symposium on Theory of Computing, pp. 164-173, May, 1993; B. Awerbuch, Y. Bartal, and A. Fiat, Distributed Paging for General Networks, in Proc. Of the 7th ACM-SIAM Symposium on Discrete Algorithms, pp. 574-583, January, 1996; Y. Bartal, A. Fiat, and Y. Rabani, Competitive Algorithms for Distributed Data Management, in Proc. Of the 24th ACM symposium on Theory of Computing, pp. 39-50, 1992. In B. Awerbuch, Y. Bartal, and A. Fiat, Distributed Paging for General Networks in Proc. Of the 7th ACM-SIAM Symposium on Discrete Algorithms, pp. 574-583, January, 1996, Awerbuch, Bartal, and Fiat design a distributed file allocation protocol and use the framework of competitive analysis (see D. Sleator and R. Tarjan. Amortized Efficiency of List Update and Paging Rules, Communications of the ACM, 28(2): 202-208, 1995) to show that their protocol is nearly optimal in terms of total communication cost and storage capacity of the nodes. However, they do not address the issue of load balancing among different servers. Moreover, while their work is significant from a theoretical standpoint, several issues concerning implementation of their protocol over the Internet are not addressed.
A system in accordance with an embodiment of the present invention includes a request distributor that receives a request for an object from a requester. The request distributor is coupled through a network to hosts that store replicas of the requested object. The request distributor determines the value of a request metric for each replica of the requested object, where the request metric is a historical measure of the number of requests for the object that have been forwarded to the host that stores the replica of the requested object. The request metric is determined substantially independently from any input from any host that stores a replica of any object to which a request for an object is forwarded. The request distributor also determines the value of a distance metric for each host at which the requested replica is stored. The distance metric measures the cost of communicating between the requester and the host. Based upon the values of the request metric and the distance metric, the request distributor selects a host to respond to the request for the object. The request distributor forwards the request to the selected host, which then responds directly or indirectly to the requester. In another embodiment, the request sends a redirect message to the requester, which then resends a request for the object to the correct host. In either case, the request distributor is said to have xe2x80x9cassignedxe2x80x9d the request to the host.
In accordance with an embodiment of the present invention, each host that stores a replica substantially autonomously decides whether to delete, migrate or replicate a replica stored at that host. The host stores a predetermined deletion threshold u and a replication threshold m for a first host such that vu is less than m, v being a real number. The host determines a request metric for the replica of the requested object stored at the first host. If the request metric is less than u, and if the replica is not the sole replica, then the replica is deleted from the first host. If the request metric is above u, and if it is determined that there is a second host to which it is beneficial to migrate the replica, then the replica is migrated to the second host. If the request metric is above m and no second host was identified to which it would have been beneficial to migrate the replica, then the host determines if there is a second host to which it is beneficial to replicate the replica of the requested object stored at the first host. If there is such a second host, then the replica stored at the first host is replicated at the second host.
The present invention advantageously selects a host to which to forward a request for an object substantially independently from input from any host to which such requests are forwarded from the request distributor. This is a considerable improvement over known techniques that rely upon such input because it reduces the network traffic that has to be generated to make a distribution decision, and reduces the complexity of such decision making. At the same time, the request distribution scheme of the present invention is very efficient. The distribution scheme also advantageously simplifies autonomous replica placement decisions.