1. Field of the Invention
This invention relates to computer systems and, more particularly, to storage management in peer-to-peer networks.
2. Description of the Related Art
Today's enterprise environments typically comprise a wide variety of computing devices with varying processing and storage resources, ranging from powerful clusters of multiprocessor servers to desktop systems, laptops, and relatively low-power personal digital assistants, intelligent mobile phones and the like. Most or all of these devices are often linked, at least from time to time, to one or more networks such as the Internet, corporate intranets, departmental or campus local area networks (LANs), home-based LANs, etc. Furthermore, most or all of these devices often store data, at least temporarily, that if lost or corrupted may lead to considerable rework and/or to lost business opportunities. While perhaps not as important from a business perspective, the loss or corruption of personal data such as photographs, financial documents, etc., from home computers and other devices outside corporate boundaries may also have unpleasant consequences. Backing up the data locally, e.g., to devices stored at the same building or site as the source data, is typically not sufficient, especially in the event of catastrophic events such as hurricanes, tornadoes, floods, fires and the like.
While backup to remote sites for disaster recovery has been implemented in various forms over the years, traditional disaster recovery techniques are often centrally controlled and expensive, and are therefore typically limited to protecting the most important, mission-critical subsets of business data. In recent years, in order to take advantage of the widening availability of Internet access and the mass availability of cheap storage, several peer-to-peer (P2P) storage management techniques have been proposed. In such P2P storage management environments, for example, each participating device may be allowed to upload data objects such as files into a P2P network or “cloud” (a large distributed network, such as hundreds or thousands of hosts connected to the Internet). In the event of a failure at the source device (the device from which the data objects were uploaded), the data objects may be retrieved from the P2P cloud. In addition to disaster recovery, P2P storage may also be utilized for a number of additional purposes, including, for example, efficient file sharing. Some or all of the participating devices may also store data uploaded by other peer devices of the P2P cloud. P2P storage management software may be installed at the participating devices to enable devices to find target devices to store uploaded data, to search for previously uploaded data within the P2P cloud, to store incoming P2P data received from peer devices, and to retrieve data from other devices of the P2P cloud as needed. P2P storage management protocols are often decentralized to support scaling to larger and larger networks, so that the responsibility of implementing the protocol does not result in performance bottlenecks at a single participating device or a few participating devices. Often, few restrictions are placed on devices for membership in P2P networks: e.g., even a home personal computer that is only powered on for a few hours a day may be allowed to participate in a P2P network.
As a result of the relatively lax requirements for participation in P2P networks, few guarantees can usually be provided regarding the availability of any given device in the P2P network. If, in a naïve implementation of P2P storage management, an important file was uploaded to only one or two target devices of the P2P network from a source device, it is quite possible that none of the target devices that store the file may be online or available when the file has to be retrieved. Data to be uploaded is therefore typically erasure coded and/or replicated at the source device prior to uploading to several targets in the P2P cloud, so that the probability of being able to recover the source data is increased. (In general, an erasure code transforms a data object containing n blocks into a data object with m blocks, where m is large than n, such that the original data object can be recovered from a subset of those m blocks.) This can, however, often lead to a substantial increase in the total amount of data that has to be transmitted from the source device, as well as a substantial increase in processing. For example, to store one megabyte of “real” data, the total amount of data required to be uploaded into the network may be five or more megabytes, representing an increase of several hundred percent in the bandwidth required for the upload. A corresponding increase in processor and/or memory usage may also be required to derive the expanded version of the data. However, many of the devices participating in P2P storage, such as home computers, laptops etc., often have relatively limited processing capabilities, memory and upload bandwidth, and may not always remain connected for long enough periods to the P2P network to upload the amount of data needed in accordance with the redundancy requirements of P2P storage management. Such resource limitations may thus become a significant hurdle preventing large-scale implementations of traditional P2P storage management techniques.