1. Field of the Invention
The present invention relates to locating distributed objects, such as distributed data objects and distributed services, on a computer network; and, in particular, to techniques for locating distributed objects based on physical communication costs, which techniques scale up to networks with a large number of nodes including complex, non-heterogeneous interconnectivity characteristics.
2. Description of the Related Art
Networks of general purpose computer systems connected by external communication links are well known and widely used in commerce. The networks often include one or more network devices that facilitate the passage of information between the computer systems. A network node is a network device or computer system connected by the communication links. An “end node” is a node that is configured to originate or terminate communications over the network. An “intermediate network node” facilitates the passage of data between end nodes.
The client-server model of computer process interaction is widely known and used. According to the client-server model, a client process sends a message including a request to a server process, and the server process responds by providing a service. The server process may also return a message with a response to the client process. Often the client process and server process execute on different computer devices, called hosts, and communicate via a network using one or more protocols for network communications. Network nodes are often hosts for client and server processes. The term “server” is conventionally used to refer to the process that provides the service, or the host computer on which the process that provides the service operates. Similarly, the term “client” is conventionally used to refer to the process that makes the request, or the host computer on which the process that makes the request operates. As used herein, the terms “client” and “server” refer to the processes, rather than the host computers, unless otherwise clear from the context. In addition, the server process can be broken up to run as multiple processes on multiple hosts (sometimes called tiers) for reasons that include reliability, scalability, and redundancy, but not limited to those reasons.
Distributed systems make data and services available over the network by providing different data items or services, or different instances of data items or services, at different nodes of the network. The data items or services, or both, available from a distributed system are called distributed objects. Distributed systems, such as distributed databases and distributed web page servers, are widely known and used in commerce. An aspect of accessing a requested object is locating the node on which the object resides, also called performing “distributed object location.”
As distributed systems and the networks on which they reside continue to grow in size and number of nodes, it becomes more challenging to responsively locate and provide access to the distributed objects, and ever-greater network resources can be consumed doing so. In large distributed systems, with thousands of nodes and hundreds of millions of distributed objects, the resources consumed to track down an object can dwarf the resources consumed to perform the operation using the object.
Various approaches to distributed object location are not scalable to large numbers of nodes. For example, in the approach used by Object Management Group's Common Object Request Broker Architecture (CORBA) and some other distributed systems, a distributed object is bound to a handle that includes an Internet Protocol (IP) address of a server that processes requests for the object. This approach is not scalable because every node in the distributed system is required to store information about every distributed object. Thus, in large distributed systems, each of thousands of nodes stores information about hundreds of millions of data objects. Furthermore, every node that wishes to deal with the object must deal with it through its assigned IP address, making the system sensitive to hardware or connectivity failures that make it impossible to connect to that address, and possibly overwhelming the assigned node's processing capability or network connection.
In a more recent approach, distributed hash tables (DHTs) are used for distributing objects in peer to peer (P2P) systems. P2P systems are characterized by multiple servers of equal rank, without a centralized authority for making decisions about the distribution of objects. DHTs do not require distribution of all distributed object information. Instead DHTs map object identifiers to node identifiers using a known mapping and hash function. Hash functions are well known in the art. A variety of DHT systems are described, for example, in Balakrishnan, H., M. Kaashoek, D. Karger, R. Morris, I. Stoica, “Looking Up Data in P2P Systems,” 5 pp, 2003, published as a document cacm03.pdf in directory / ˜istoica/papers/2003/ at domain cs.berkeley.edu on the World Wide Web (www), hereinafter Balakrishnan, the entire contents of which are hereby incorporated by reference as if fully set forth herein. DHTs rely on a recursive look-up process in which each node keeps information about a subset of the distributed objects that map to that node. Requests for other objects are sent to the node with the node identifier produced by the mapping. The recursive lookup process scales logarithmically (i.e., lookup overhead increases with the logarithm of the number of nodes), or better as the number of nodes in the network increase.
Improvements in the distribution of objects' identifiers among node identifiers, which more evenly spread the load and more easily recover from node removal and node joins, have been proposed. For example, in one approach, a continuous identifier region (Voronoi cell) is centered on discrete node identifiers (generators on a Voroni graph). All object identifiers that map into the region around the node identifier are assigned to the node with that identifier. The continuous identifier space can be one dimensional or multi-dimensional, with the number of dimensions designated by the letter “d.” This approach is called the continuous-discrete approach and is described in Naor, M. and U. Wieder, “Novel Architectures for P2P Applications: the Continuous-Discrete Approach,” 10 pp, 2003, published as a document dh.pdf in directory /˜naor/PAPERS/ at domain wisdom.weizmann.ac.il on the World Wide Web (www), hereinafter Naor I; and in Naor, M. and U. Wieder, “A Simple Fault-Tolerant Distributed Hash Table,” 6 pp, 2003, published as a document simple_fault_tolerant.pdf in directory /final-papers/2003/ at domain iptps03.cs.berkeley.edu on the Internet, hereinafter Naor II; the entire contents of both of which are hereby incorporated by reference as if fully set forth herein.
While suitable for many purposes, DHTs and continuous-discrete DHTs still suffer some disadvantages. In particular, these DHTs associate objects with nodes without regard to the physical cost, in terms of time and network resource utilization, of transferring data between nodes. Once the IP address of the service is retrieved, after the mapping, additional communication relies on the shortest path first (SPF) routing method implemented in an underlying network, outside the control of the DHT approach. Essentially, these approaches assume the SPF routing accomplishes all transfers with an equal average cost. Problems that arise because of ignorance about the physical structure and state of the network include excessive cross-core routing, denial of data existence, rejoin problems, flapping and stabilization complexities.
In cross-core routing, a lookup request for a certain object sometimes bounces back and forth across a potentially congested wide area network (WAN) link. If both the requester and the data are on the same local area network (LAN), this excess traffic on the congested link is a misuse of the limited resources on that link. This problem is exacerbated when objects are replicated and the replicating node is on a remote LAN while most of the using nodes are on the same LAN as the owning node.
As an example of the denial of existence problem, if a node is unavailable for communications, even for a short time, ownership of the node's objects is transferred to another node that does not actually have the objects. Thus the objects located at the original node can no longer be found by the DHT systems and are assumed to be non-existent. Clients, servers, and other computer application processes that just created the object are often unable to cope gracefully with a system that subsequently denies the existence of the object. For example, an application that just stored some data as a data object and “knows” the data exists might not be programmed to deal with a system response that denies the existence of the data object. Such a program might sit idly, doing nothing, while it waits to retrieve its data.
As an example of the rejoin problem, consider what happens in the above case when the node with the lost objects rejoins the network. Then ownership of the lost objects must be transferred eventually back to the rejoining node in a complex and resource consuming process.
Flapping occurs when a node is repeatedly removed and rejoined to a network. This can occur even when the node is persistently linked to the network. For example, congestion on a link, or static at a modem, causes messages to be dropped. If some of the messages are “Keep Alive” messages in which nodes of the distributed system announce their availability, one or more other nodes might infer that a node whose messages are dropped is dead. In response, the other nodes send more messages to reassign ownership of the objects originally owned by the “dead” node. This further congests the link and causes more nodes to be inferred as “dead.” When traffic subsides and “Keep Alive” messages are not dropped, more resources are consumed in rejoining the formerly “dead” node. The repeated dying and rejoining of actually connected nodes is called flapping. To reduce flapping, some stabilization measures can be taken, but such measures increase the complexity of the system.
Based on the foregoing description, there is a clear need for techniques to locate distributed objects on a network, which techniques do not suffer the deficiencies of prior art approaches. In particular there is a clear need for techniques to locate distributed objects that scale to networks with a large number of nodes and/or heterogeneous connectivity between nodes. Furthermore, there is a need for techniques to distinguish between a non-existent distributed object and a distributed object that is found at a node that is temporarily unavailable.