Overlay networks have gained attention both in the academic world and in the industry in the last few years. Various overlay applications are spread through the Internet, making it ubiquitous and highly used by end users. An overlay network is capable of improving reliability, performance and availability to an existing infra-structure network. Overlay networks provide an elegant way to solve several networking problems, especially when no changes should be done to the existing network and the network is heterogeneous.
Currently, few solutions are found in the literature to solve the problem of packet routing in overlay networks built over flat identifiers. A fundamental problem that exists with routing based on flat identifiers is that the flat identifier space can not be aggregated. Most of existing routing is based on hierarchy and aggregations of Internet Protocol (IP) addresses to network addresses, so called sub-netting. Applying a subnet mask to an IP address allows you to identify the network and node parts of the address.
File sharing overlay applications usually are constructed based on flat identifiers. This kind of overlay application relies on Distributed Hash Tables (DHTs) in order to find a given resource. DHTs are a type of decentralized distributed data structure. Each node taking part in a DHT has one unique overlay identifier, normally a flat identifier. However, the identifiers are not used for data packet routing; they are used to route messages of lookup for a given resource (a file, for example). The session communication establishment in these overlay networks is accomplished by resolving the overlay identifier (of a resource) into the underlay address (of the resource holder), and then the data packet routing is completely done in the underlay level. Therefore, the data packet routing relies on the routing mechanism of the underlying network (e.g., IPv4). This solution is reasonable since only one homogeneous underlying network is assumed. Note that this requires one unique homogeneous layer-3 technology for all the nodes participating in the overlay network. However, when heterogeneous layer-3 networks (or domains) coexist, the network address of a node can be meaningless to another peer.
Several protocols that implement the concept of DHTs have been proposed in the last years, e.g., CAN, Chord, Pastry and Tapestry protocols. Although these protocols have some disparities they all utilize the same principles; a key is usually produced by hashing a filename. The generated key is employed to store or lookup locality information of the file in an overlay network formed by nodes that are members of the DHTs.
The DHT internal routing algorithm, i.e., the mechanism for routing store and lookup messages, is the heart of the DHT protocol. The member nodes form an overlay network with each node having a group of other nodes as neighbors. When a lookup for a given key is carried on, the message is routed through the overlay network to the node responsible for that key. The overall scalability and performance of the system is directly connected to the routing algorithm efficiency.
With the purpose of distributing the processing and storage load, each node handles a portion of the hash space and therefore is responsible for a certain key range. Given a key, all nodes can efficiently route messages to the unique node responsible for that key. DHTs provide properties such as decentralization, scalability, load balance, fault tolerance, and self-healing. Decentralization distributes the keys through the nodes and the organization of the system occurs without any central coordination. No node is more significant than any other. Scalability allows the system to handle a large number of nodes even with high churn (nodes joining and leaving the structure frequently). In general, the cost of the lookup process grows with the log of the number of nodes.
Load balance using a consistent hashing function spreads the key range over the nodes with high probability, providing an innate way of doing load balancing. Fault tolerance of the system provides reliability even when failures of some nodes occur. Self-healing enables automatic reorganization of the system, reflecting the newly joined, left or failed nodes.
The DHT nodes may join or leave the network. The protocols must solve this issue in order to keep the system consistent. Consistency is assured by properly updating the routing table when a node joins or leaves the network. The basic structure is built around an abstract flat keyspace, which is split among the participating nodes according to the keyspace partitioning scheme. Each implementation uses some variant of consistent hashing to map objects (e.g., a filename) into a key. The consistent hashing implies that high churn rates affect only the set of keys owned by the adjacent nodes, leaving all other nodes unaffected. The minimization of movement of stored objects from one node to another reduces the reorganization time, allowing high rates of arrival and departure of nodes in the system. This contrasts with the traditional hash table where the addition or removal of one hash bucket requires the remapping of the entire keyspace.
The key matter is how to organize the nodes in a way that the lookup process becomes efficient. More than that, the efficiency of the lookup process depends on where and how much data is replicated, on the cache mechanism used and on how the search for a key is conducted. Upon receiving a lookup query, a node checks if the data corresponding to the searched key is stored locally. In that case, the data is returned and the search ends. If that is not true, the node selects a peer closer to where the data is stored and forwards the query to this node. The definition of a “closer” peer is protocol dependent. Nodes maintain a DHT routing table to assist the decision of lookup (or store) query forwarding. The process is similar to IP packet routing—if a router cannot deliver the packet directly, it chooses another router that is closer to the final destination. Routing in DHTs aims at sending the query toward a node where the key is stored. The routing table keeps a set of neighbor nodes. Neighborhood relationships can be based on physical proximity, proximity of node identifiers, successor and predecessor relationships, among others.
FIG. 1 illustrates a Chord ring, probably the best known DHT protocol. The Chord protocol uses a fast consistent hashing function which assigns each node an m-bit identifier using a cryptographic hash function such as MD5. A node receives an identifier by hashing its own IP address. In FIG. 1, m=3, the shaded circles represent nodes and the keys 1, 2 and 6 are stored in nodes 1, 3 and 0, respectively. The identifier space can be illustrated as a circle, modulo 2m, arranged in increasing order clockwise. The key k is assigned to the first node whose identifier is equal to or follows k in the identifier space. This node is also called the successor node of the key k.
It is accepted wisdom that IP addresses are currently overloaded with two functionalities: locator, as the IP address is used to route packets in the network, and identity, as the IP address is also used to specify an interface card. The overloading of both functionalities in IP addresses is one of the aspects that make mobility of nodes a difficult task. Autonomous administered heterogeneous layer-3 networks can be connected, but the network address of the destination can be meaningless to another node.
Therefore, there is a need in the art for a scalable, reliable and resilient routing architecture for overlay networks associated with heterogeneous layer-3 networks.