1. Field of the Invention
The invention relates to the data communications networks and the sharing of information objects between communicating entities. Particularly, the invention relates to a method for the obtaining information objects in a communication system.
2. Description of the Related Art
In World Wide Web (WWW) the obtaining and providing of content items is centralized to a server node from which browsing clients retrieve them. The server node must have sufficient processing and storage capacity to serve content requests. The server must have also sufficient bandwidth at its disposal to be able to provide the content items to the browsing client nodes. Thus, the server may introduce a bottleneck to the system. In order to facilitate content sharing peer-to-peer networks are emerging. In peer-to-peer networks the content providing burden is distributed to a number of file sharing nodes. Either a file system is distributed among the file sharing nodes so that different files are located in different machines. It is also possible to distribute parts of single files to different content sharing nodes in order to speed-up the downloading process.
Reference is now made to FIG. 1, which is a block diagram that illustrates content sharing in a communication system in prior art. In FIG. 1 there is an IP network. The IP network comprises three domains, namely domains 160, 162 and 164. In domain 160 there is a server 150. In domain 162 there are three clients, namely client 154, client 156 and client 158. Clients 154 and 158 are illustrated as laptop computers and client 156 is illustrated as a desktop computer. In domain 164 there is a client 152, which is illustrated as a wireless computer, for example, a mobile station. It should be noted that the illustration of the clients as laptop computers, desktop computers or wireless computers is purely for illustration purposes. Server 150 is serving a WWW-site which provides a content item, in other words, a content object (not shown), which is downloaded by the clients. The content item may be a file of any possible type, for example, a picture, a video or an audio recording. The mode of operation for content distribution in the World Wide Web is illustrated in FIG. 1. In this mode of operation each of the clients downloads the content item separately. Each of the clients has a separate dialog with server 150. In FIG. 1, client 152 first issues an HTTP request message to server 150 and is provided with a 200 OK message that comprises the content item. The interaction is illustrated with arrows 101 and 102. The obtaining of the content items to clients 154, 156 and 158 similarly is illustrated with arrows 103 and 104, 105 and 106, 107 and 108, respectively. The problem with this mode of operation is that server 150 constitutes a single point of failure in the network and the bandwidth available to server 150 is a bottleneck in the system. In order to deal with the disadvantages of a system of this kind new peer-to-peer technologies have been developed. The hosting and the transmission of a given content object is no longer the duty of a single server node. In peer-to-peer networks there are no longer just centralized servers, which are in the charge of providing and storing content items for clients.
Reference is now made to FIG. 2 which is a block diagram that illustrates content sharing in a peer-to-peer network in prior art. In FIG. 2 peer-to-peer network operates logically on top of six IP networks, in other words, domains. Peer-to-peer network comprises domains 220, 230, 240, 250, 260 and 270. The domains are, for example, sub-networks implemented with various different technologies such as local area networks or wireless local area networks, Asynchronous Transfer Mode (ATM) networks or Point-to-Point Protocol (PPP) networks. Functional entities that participate in file sharing are referred to as peers. In FIG. 2 peers are implemented in separate network nodes. In domain 220 there are nodes 222, 224 and 226. In domain 230 there is a node 232. In domain 250 there is a node 252. In domain 260 there is a node 262. Finally, in domain 270 there is a node 272. The internal functions within node 272 are illustrated with box 273. Node 252 acts as a web server. The downloading may be performed, for example, using the BitTorrent protocol defined in “Incentives Built Robustness in BitTorrent”, B. Cohen, In First Workshop on Economics of Peer-to-Peer Systems, pages 251-260, Berkeley, Calif., USA, 2003. Node 272 is a tracker. A tracker provides information on all the nodes that participate in the downloading of a given content item. By providing the information on the nodes that participate in the downloading it is possible to ensure that different nodes download different parts of the content item. Thereby, the burden of providing the content item is shared between several nodes. In BitTorrent the nodes download first the part of the content item that occurs rarest among the other nodes that are downloading the same content item.
In FIG. 2 the starting point is that node 232 has downloaded content item C1 from node 262, as illustrated with arrow 201. Node 232 holds a complete copy C1′ of content item C1. Thereupon, a user on node 224 downloads a torrent file from node 252 as illustrated with arrow 202. The torrent file provides a reference to node 272, which acts as a tracker for the torrent. A torrent file has the file name extension “.torrent”. Node 224 obtains from node 272 information on the nodes that participate in the downloading process so that they are able to provide pieces of content item C1, as illustrated with arrow 203. At time T2, when the information request from node 224 arrives, the nodes thus far informed to node 224 include only nodes P2 and P4, that is, nodes 262 and 232. Earlier, at time T1, the request from node 232 resulted only to the information on node P2 since it was the only node holding a copy C1′ of content item C1. In response to information on nodes P2 and P4, node 224 starts downloading different portions belonging to content item C1 from nodes 232 and 262, as illustrated with arrows 204A and 204B. Later, node 226 also obtains (not shown) the torrent file pertaining to content item C1 from node 252 and contacts (not shown) tracker 272. At time T3, in response, node 226 obtains information on nodes P2, P4 and P5 from tracker 272. It is now possible for node 226 to obtain pieces of content item C1 from nodes 224, 262 and 232, that is, nodes P5, P2 and P4, in order to speed up the downloading process. The obtaining of pieces of content item C1 is illustrated with arrows 205A, 205B and 205C. The nodes that participate in the downloading and sharing process are also referred to as a swarm in BitTorrent.
Reference is now made to FIG. 3, which is a block diagram that illustrates a distributed hash table in prior art. The concept of a distributed has table is defined, for example, in “Kademlia: Peer-to-peer Information System Based on the XOR Metric”, P. Maymounkov, D. Mazieres, Electronic Proceedings for the 1ST International Workshop on Peer-to-peer Systems, 7-8 Mar. 2002—MIT Faculty Club, Cambridge, Mass., USA.
Distributed Hash Table (DHT) 360 is stored on a number of nodes such as nodes 361, 362, 363, 364, 365, 367, 368 and 369. In FIG. 3 there is illustrated a circle 370, which represents the key space of distributed hash table 360. In FIG. 3 the key space size is 228 bits. In order to store value in distributed hash table 360, a key is computed of the value. The key is a number in the 128 bits key space.
Each node participating in the distributed hash table is assigned a key which is referred to as the identifier of the node. A range of keys around that identifier are assigned to the node. All values that have a hash key which falls within the key range of a given node are stored to that node. In order to provide robustness, a number of different nodes maybe assigned to be in charge of the same key space range. For example, there maybe two nodes which are in charge of a given key space range and are able to route a query carrying a hash key to the right node. Each node must store some information on other nodes that participate in the distributed hash table. The information on other nodes maybe arranged as a cache. The density of hash keys and corresponding IP addresses in the cache of a node is inversely proportional to the distance of the keys from the key of that node. The cache is arranged to have a number of levels. The densities of the hash keys with a stored IP address are dependent on the level. The highest level represents the entire key space. The lowest level represents the immediate neighbors of the node in terms of the key space assigned to them. In Figure there are three levels. The highest level is key range 370. The middle level is represented by arc 372. The lowest level is represented by arc 374.
In FIG. 3 there is illustrated how the IP address of a node hosting a given file maybe discovered using distributed hash table 360. At time T0, a first hash key is computed using the file content itself. The node closest to the first hash key shall store the IP address for the node holding the file. In this case the node to hold the key is node 367. Thereupon at time T1, a second hash key is computed using the name of the file or alternatively a keyword for the file. The second hash key is closest to hash key 366 associated with node 368. Therefore, the first hash key is stored in node 368.
At time T2 node 364 wishes to obtain the address of the node that currently stores the file mentioned above. Therefore, node 364 computes the second hash key using the file name. Using the second key node 364 determines from its cache the node which is closest to the second hash key in terms of key space and the hash key values assigned to the node. Therefore, node 364 determines node 361 and sends the second hash key to node 361 as illustrated with arrow 301. Node 361 determines using the second hash key again a node that in its cache is closest to the second hash key. Node 361 determines node 362 as the closest node and sends the query to it as illustrated with arrow 302. Thereupon, node 362 determines node 363 and sends the query to it, as illustrated with arrow 303. Node 363 determines node 365 and send the query to it, as illustrated with arrow 304. Finally, node 365 determines that node 368 in its neighbor cache and is in charge of the key range comprising the second hash key. Therefore, node 365 sends the query to node 368, as illustrated with arrow 305. By knowing the IP address of node 364 from the query, node 368 is capable to send the first hash key directly to node 364.
Thus, node 364 is capable of starting the distributed hash table traversal process again to obtain the address of node 367 currently hosting the file. This time the traversal process uses the first hash key. Node 364 may also obtain directly the IP address of node 367 from node 368, if it is guaranteed that node 367 will be up and able to serve the query.