In a conventional client-server network, shown in FIG. 1A, a server 100 provides data to all clients 102, 104, 106. Security protocols such as secure sockets layer (SSL) are used to provide authentication and data confidentiality between client and server. When the number of clients increases, however, the bandwidth and storage demands on server 100 increases proportionately, which can result in reduced performance, especially when the server is providing clients with large digital media files. These problems with the client-server network paradigm have motivated the development of peer-to-peer networks. In a peer-to-peer network, shown in FIG. 1B, multiple peers 108, 110, 112 share data directly with each other instead of obtaining it from a server. For example, peer 108 can provide a file to peer 110 which in turn provides the file to peer 112. Alternatively, peer 112 could have obtained the same file from peer 108. As the number of peers increases, the bandwidth and storage capacity of the peer-to-peer network automatically increases as well. For these reasons, peer-to-peer file sharing networks have become popular ways of distributing large media files.
Most peer-to-peer networks are in fact hybrid peer-to-peer networks which combine features of pure peer-to-peer networks with features of conventional client-server networks. For example, as shown in FIG. 2, a peer-to-peer network may have a server 200 and clients 202, 204, 206 functioning as a conventional client-server network. At the same time, peers 202, 204, 206 function as a peer-to-peer network. Typically, the files are shared among peers while the server organizes and administers the peer-to-peer network, e.g., providing a directory of available files and peers that can share them. In some cases, the server may also provide files, e.g., if no peer is available to provide it. For example, server 200 may provide peer 202 with a file, which then may be shared with peer 204 which in turn shares it with peer 206. Alternatively, peers 204 or 206 could have obtained the file directly from server 200.
Unlike a client-server network, the data flow in a peer-to-peer network is not centralized at a trusted server that controls access to files and ensures their confidentiality and integrity. Consequently, peer-to-peer networks pose new security issues. For example, suppose peer 204 wants to obtain a file from peer 202. How does peer 202 know peer 204 is authorized to obtain it? How can data confidentiality between peers be supported? How can peer 204 be assured of the integrity of the file it obtains from peer 202? How can peer 202 prove that it delivered the file to peer 204? Such questions are important in hybrid peer-to-peer networks used to distribute software packages, sell large multimedia files, share critical information among participants, and many other applications. In addition, any practical solution to these peer-to-peer security issues should not require a large processing or bandwidth overhead.
The challenges of providing data integrity and proof-of-service in a peer-to-peer network are considerably larger than those for authentication and confidentiality. Moreover, ensuring data integrity in peer-to-peer networks is especially important since the integrity of data must be assumed for any proof-of-service to be meaningful.
The most obvious solution to providing data integrity in a peer-to-peer network is simply for the server to provide a digital signature of the file. Unfortunately, if the signature verification fails, the entire file must be retransmitted. When the file is very large, this consumes a large amount of bandwidth and time. To address this problem, the file object O may be divided into a sequence of N smaller data blocks {b(1), . . . , b(N)}, and the server individually signs each block. This solution, however, introduces a large computational overhead since the client must decrypt the signature of every block in order to verify the integrity of the file.
Another technique can be used to reduce the computational demands of verifying individual blocks. Instead of signing every block, the server signs a single “superblock” {H(1), . . . , H(N)} that contains a strong one-way hash value H(i) for every block b(i) of the file. Before receiving any data blocks, the client obtains the superblock from the server and verifies its signature. Once it has the superblock, the client then verifies the integrity of each block it receives by computing a hash for the block and comparing it to the corresponding hash value in the superblock. Although this technique dramatically reduces the computational demands on the client, it can result in a long startup latency because the client must receive the entire superblock before receiving the first block of a file. The delay would not be acceptable to applications in which users prefer prompt response, such as multimedia streaming. Moreover, if the superblock itself is corrupted, the retransmission can also be costly. Increasing the block size can reduce the size of a superblock, but the retransmission cost of individual blocks will increase.
One known technique to address this start-up latency problem is based on the use of a Merkle hash tree. Given a data object O divided into N=2m blocks {b(1), . . . , b(N)}, its binary Merkle hash tree, denoted M(O), is a binary tree of 2m+1−1 hash values organized into m+1 levels. FIG. 3 shows an example Merkle hash tree for a data object with eight blocks. Level j of the tree consists of 2j hash values, denoted H(j,1), H(j,2), . . . , H(j,2j), where H(j,i) is a hash of the consecutive pair of hash values H(j+1,2i−1), H(j+1,2i) at level j+1. Level 0 of the tree (its “root”) consists of a single hash value H(0,1). Level m of the tree (its “leaves”) simply consists of the 2m hash values {H(1), . . . , H(N)} of the data blocks {b(1), . . . , b(N)}. Thus, the hash values H(m,1), H(m,2), . . . , H(m,2m) at level m are simply the hash values {H(1), . . . , H(N)}.
The hash tree M(O) of an object O is typically computed recursively by first computing hashes of the data blocks {b(1), . . . , b(N)}, then computing hashes of these hashes, and so on, until the root value H(0,1) is computed. For example, FIG. 3 shows hash values 314, 316, 318, 320, 322, 324, 326, 328 at level 3 derived directly from corresponding data blocks 330, 332, 334, 336, 338, 340, 342, 344. Hash values 306, 308, 310, 312 at level 2 are then derived by calculating hashes of pairs of consecutive hashes taken from the level 3 hash values. For example, hash value 306 is a hash of hash values 314 and 316. Similarly, hash values 302 and 304 at level 1 are derived from hash values 306, 308, 310, 312 at level 2, and root hash value 300 at level 0 is derived from hash values 302 and 304 at level 1. An important property of the hash tree M(O) is that the root hash value H(0,1) depends on the data in all blocks.
In conventional methods for data integrity verification using a Merkel hash tree, the integrity of each block of a data object O is independently verified by the receiving client. Before receiving any data, the client first requests a certified value of H(0,1) and verifies the signature. Once it receives a block b, the client requests the authentication path of b, denoted A(b). The authentication path consists of a sequence of m hash values, one from each level. The hash value at a given level in the authentication path is the sibling of the hash value along the direct path from the hash of b upward toward the root value H(0,1). For example, the authentication path for block b(6) in FIG. 3 is A(b(6))=<H(3,5), H(2,4), H(1,1)>. These values are then used to calculate H(0,1) from the hash H(6) of block b(6) by moving up the levels of the tree, combining the computed hash at each level with the sibling authentication hash at the same level to obtain the computed hash at the next level up. For example, computed hash H(6)=H(3,6) is combined with authentication path hash H(5)=H(3,5) to obtain computed hash H(2,3) which in turn is combined with authentication path hash H(2,4) to obtain computed hash H(1,2) which finally is combined with authentication path hash H(1,1) to obtain root hash value H(0,1). If the block is corrupted or otherwise altered, then the calculated value of H(0,1) will not equal the certified value of H(0,1). Thus, the authentication path of a block and the certified hash value of the root allows the client to verify the integrity of any block. If the integrity is not verified, the client can then request the retransmission of block b. Using this method, a client does not have to download all the hash values from the entire table beforehand, nor does it need to perform expensive encryption or decryption operations. However, this solution can still lead to a high bandwidth overhead. For a data object with 2m blocks, every block's authentication path will have m hash values. Assuming each hash value is 16 bytes, the overhead traffic will then be 16m*2m bytes, or a fraction 16m/|b| of the data traffic, where |b| is the number of bytes per block.
In addition to the above drawbacks with authentication, current peer-to-peer security protocols also suffer from problems with other aspects of a complete security solution. For example, providing proof-of-service is important in a peer-to-peer network so that peers can demonstrate that they provided data to another peer. Proof-of-service, however, is meaningless without a reliable data integrity scheme since proof-of-service presupposes that the data delivered was not corrupted or otherwise altered. Thus, a practical proof-of-service scheme requires a practical data integrity scheme. In addition, proof-of-service has its own inherent challenges. For example, it is desirable in a peer-to-peer system to allow multiple peers to provide a receiver peer with different portions of a single file. A proof-of-service scheme in this case might require the receiver peer to send an acknowledgement to each provider peer for each block received. The providers, however, then send the acknowledgement for each block to the server, resulting in a large bandwidth overhead and demand on the server in the case of large files. There are also other challenges associated with proof-of-service, such as ensuring that provider peers cannot forge a proof-of-service, and that receiver peers can not take data without sending acknowledgements of receipt.
In view of the above, it would be an advance in the art to provide a practical and reliable peer-to-peer security protocol that overcomes some of the problems with current approaches.