In general, there are at least two models for exchanging data among a set of interconnected computers. One such model is termed the “client and server” model. In this model, client machines send their requests to the server machine, which has a well-known address. The clients wait for a response from the server, and down-load the requested data. The server is often maintained as a publisher of data and typically has a relatively large bandwidth connection to a relevant network, as well as significant processing power and storage capacity. The server is generally responsible for indexing, locating, retrieving, storing, caching and securing the data. The server also ensures data integrity and enforces an access policy on the data.
The clients in a client/server arrangement typically contain very simple and limited logic. The primary responsibility of clients is to initiate and complete “transactions” with the server. The clients are thus often pure consumers of data “published” by the server. Users who wish to “publish” data must up-load their data to a server.
An alternative arrangement is a fully distributed and decentralised model that is known as the “peer-to-peer” model. In this model, the computers/nodes that are connected together are referred to as peers. All peer machines are conceptually equal, and there are no peers with special administrative roles. The peers are generally organised in a flat (i.e. non-hierarchical) structure and connections are formed in an ad-hoc manner. Any peer can “publish” data to, and “consume” data from, any other peer. The autonomous peers in such arrangements are typically considerably more complex than the clients in a client and server system, and often come in a variety of hardware and software configurations.
A further configuration is known as the hybrid peer-to-peer model, which attempts to embody the advantages of both the above-mentioned models. The hybrid peer-to-peer model is quasi-decentralised and is often characterised by one or more of (a) the presence of some hierarchical structure, (b) special peers or (c) servers. There are many flavours of hybrid peer-to-peer systems and they usually vary in their level of decentralisation. The hybrid peer-to-peer model requires peers to contain some intelligence, in order to coordinate the activities among the peers.
Typically, hybrid peer-to-peer systems are more scalable (i.e. able to cope with increasing workload gracefully, systematically and essentially transparently) than client and server systems. This is because administrative and coordinative responsibilities are distributed among the peers. However, hybrid peer-to-peer systems can suffer from poor quality of service, particularly in the form of frequent disruption of data availability. This is partly due to the volatile membership of hybrid peer-to-peer networks, in which peers can join and disconnect from the system as often as they wish. Furthermore, the decentralised nature of the hybrid peer-to-peer system gives rise to situations in which data can be served exclusively by peers that own or have obtained the data on an exclusive basis.
These problems are exacerbated in non-public systems, where data is only designated for a small set of users because the potential “servers” of any data are few in number. Furthermore, because peers play a more or less equal role in hybrid peer-to-peer systems, security policy and access rights are difficult to enforce.
A common approach to improve the quality of service in peer-to-peer systems is to make use of redundancy, by caching multiple copies of the same data on multiple peers. This approach has its limitations because reliable resources in hybrid peer-to-peer systems are scarce. This approach also increases the storage requirements of the system, and thus reduces system scalability. Furthermore, this approach poses potential security problems, because some peers may be malicious.
To overcome security problems, some hybrid peer-to-peer systems segment the data physically. Such systems cache only a portion of a given file in any given peer, and distribute the data portions making up the file across a number of peers. A particular file portion stored in any one of the aforementioned peers is typically meaningless on its own, and must be recombined with the other file portions which must be retrieved from the respective peers in which the portions have been stored. This approach increases the latency of such systems due to the substantial increase in complexity of data query and extraction processes.
Alternatively, the entire file can be cached only on designated peers, i.e. peers that have permission to access the data in the file. This reduces the security threat slightly, at the cost of a substantial reduction in quality of service. Furthermore, since the entire data file is repeatedly cached on each of the designated peers, the file is still vulnerable to attacks from non-designated peers. This is because if a non-designated peer compromises the security of a designated peer, the non-designated peer can access the entire data file.
It has also been previously noted that redundancy by itself may not necessarily guarantee a substantial increase in a system's quality of service, due to the volatile nature of the membership, and the decentralised architectural nature of hybrid peer-to-peer systems. A consequence of the volatile membership is that the availability of peers over time is often not uniformly distributed. Further, peers that are in the system may not always be active serving peers.