It is common place for data processing systems to be formed from a number of individual data processing units being able to communicate with one another. An example of a suitable data processing unit is a personal computer, or alternatively a work station. The individual personal computers may be similar in processing capacity and data storage to one another and may be physically located at a single site, for example within the offices of a company. In this example, the communications between each personal computer may be in the form of a wired network using dedicated network communication cables. Equally, one or more of the personal computers may be dedicated data storage units arranged to provide the majority of the data storage facilities for the company.
A further example may be individual personal computers located at geographically disparate locations, for example at individual residences, and having the ability to communicate to one another via a public network, such as the Internet.
There are various ways in which the data processing units may be arranged to operate in order for the data processing system to function. For example, a single data processing unit may be arranged to centrally manage the various tasks of the entire system, with the remaining data processing units arranged to defer such system management functions to the single designated data processing unit. Alternatively, the various system management functions may be distributed across the entire data processing system. In this latter case, at least a number of data processing units are capable of performing one or more system management functions, either individually or by co-operating with other data processing units. As there is little or no hierarchical structure in data processing systems of this kind, they are often referred to as “Peer-to-Peer” networks. It is a common feature of peer-to-peer networks that communication between individual data processing units (peers) is direct, by which it is meant in this context that the communication is not directed, or brokered, via a further managing data processor. However, it will be appreciated that such direct communication may involve the use of one or more intermediate data processing units acting purely to relay on the communication where no physical direct communication link is available. It is to Peer-to-Peer networks that embodiments of the present invention are particularly directed to.
In peer-to-peer networks and other similar distributed data storage systems, the management of data stored at various locations across the data processing systems can be problematic. Particularly in the latter example of a shared public data processing system, it can be difficult to compile and maintain an accurate record of what information is stored at any given location within the system.
One known technique intended to address this disadvantage is to provide the data processing system with an index. The index is intended to maintain a directory of the information present within the data processing system. For example, the index may comprise a list of individual data items together with the identity or location of individual data processing units at which the data item is located. In systems utilising such an index, the individual users of the data processing units may look up a particular data item in the index to establish its location and, if required, subsequently retrieve the data item, or a copy thereof, from the location indicated by the index. The disadvantage with this known system is that there is a reliance on individual data processing unit users to inform, or update, the index of changes relating to data items located at that particular data processing unit as there is no mechanism provided for automatically doing so. For example, should the user of a data processing unit decide to delete a particular data item from that data processing unit, it is reliant upon that user to inform or update the index accordingly. Whilst this may work reasonably well when the data processing system in question is a corporate, or company owned, system, it is less likely to be reliable when the data processing system is a publicly shared one. In the latter case, there is an absence of corporate pressure on users to maintain the index. This leads to the strong possibility that data items are deleted, added or copied by individual users without the index being modified. Equally, data may be accidentally or deliberately replicated. The index is therefore not a reliable source of information about any one data item and, in particular, there is a significant risk that all copies of any given data item may be deleted from the data processing system before this fact, or the reduction in numbers of copies, is reflected in the index. This is clearly a significant disadvantage if the data processing system is to be used with valued data items.
Conversely, the distributed nature of the data on the data processing system, can result in difficulties in managing old or infrequently used data items. Subsequently, a larger number of copies of a data item may be maintained across the data processing system than is necessary considering the age or frequency of use of the data item, whereas it may be more efficient to simply delete data items that are older than a certain age or are infrequently accessed.