1. Field of the Invention
The present invention relates generally to data processing, and more particularly to a method and system of administrating a peer-to-peer file sharing network.
2. Description of the Related Art
In the last few years, exchanges over peer-to-peer (P2P) networks or loosely coupled networks have exploded. Most of the Internet traffic today is P2P traffic.
File sharing refers to sending and receiving digital files over a network, usually following the peer-to-peer (P2P) model, where the files are stored on and served by personal computers of the users. Most people who engage in file sharing on the Internet both send (upload) files and receive files (download). P2P file sharing is distinct from file trading in that downloading files from a P2P network does not require uploading, although some networks either provide incentives for uploading such as credits or force the sharing of files being currently downloaded.
The first generation of peer-to-peer file sharing networks was a centralized server system. A user would send a search to the centralized server of what they were looking for. The server then sends back a list of peers that have the data and facilitates the connection and download. The second generation was about decentralization: a network without a central index server, with some nodes being ‘more equal than others’; searches are passed along nodes. The third generation of peer-to-peer networks are those that have anonymity features built in, sometimes using TOR (onion routing); proxies used for other users' IP addresses make it hard to determine who is downloading. The fourth P2P-Generation deals with streams over P2P (not only files), for example with video or television streams.
Peer-to-peer file sharing protocols are used to distribute large amounts of data. Compared to standard internet hosting, this kind of protocol provides a significant reduction in the original distributor's hardware and bandwidth resource costs. It also provides redundancy against system problems and reduces dependence on the original distributor. The protocol received a huge success (this is sometimes called “super-distribution”).
Users, nodes or peers share so-called “torrents”. A “torrent” is a file which contains metadata about the files to be shared and about the tracker, the computer that coordinates the file distribution. The tracker maintains lists of the clients currently participating in the torrent. Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker. The peer distributing a data file treats the file as a number of identically-sized pieces, typically between 64 Kb and 4 Mb each. The peer creates a checksum for each piece, using the SHA1 hashing algorithm, and records it in the torrent file. When another peer later receives a particular piece, the checksum of the piece is compared to the recorded checksum to test that the piece is error-free. The initial distributor of the complete file or collection acts as the first seed.
The latest P2P software clearly shows that content management, generally speaking, is very poorly addressed. This is not surprising because publishers are not involved in organizing the offer. Content management is also poorly addressed in other aspects. For example the shared folders features of software like MSN Messenger do not have a content management mechanism. As a result, it happens frequently that some files are not available on the network anymore since there is no seeder of this file in the network.
There have been many attempts to enhance current widely used systems. For example, in response to polluting attacks (e.g. inserting “bad” chunks/packets into an otherwise valid file on the network), some users have developed so-called private P2P file sharing software. There are some “P2G” (peer-to-group) solutions to optimize the streaming or the share of contents for a group of users (a community). “Trust” is the master word in those systems; but content management in fine relies on users' behaviors (as in open P2P networks). The organization remains dependent on users. “Social storage platform” have also emerged; branded as “peer-to-peer virtual hard-drive”, shared part of hard drives are consolidated on a worldwide scale. It creates a virtual hard drive in which users may save files. Users never share actual hard drives but simply a very small percentage of their available free space. Users then share disk space rather than specific files. Content management in this kind of system is a real challenge. There is no mention of it other than classifying documents (hierarchy of folders). It is also possible to mention “grid systems” (secured, distributed and fault tolerant routing system), providing cache features; in these systems, any server may create a local replica of any data object. These local replicas provide faster access and robustness to network partitions. They also reduce network congestion. But the use of replicas reduces globally the efficiency of storage, mainly because it is poorly tied to actual demands. There are many other systems for exchanging data between peers. Sometimes users exchange complete hard drives, from hand to hand. But all these systems and methods fail to provide satisfactory content management mechanisms.
Existing content management systems in peer-to-peer networks target quality of service (QoS) and do not effectively address problems related to the content management aspects.
For example, issued U.S. Pat. No. 7,133,368, entitled “Peer-to-peer method of quality of service (QoS) probing and analysis,” discloses a peer-to-peer (P2P) probing/network quality of service (QoS) analysis system which utilizes a UDP-based probing tool for determining latency, bandwidth, and packet loss ratio between peers in a network. The probing tool enables network QoS probing between peers that connect through a network address translator. The list of peers to probe is provided by a connection server based on prior probe results and an estimate of the network condition. The list includes those peers which are predicted to have the best QoS with the requesting peer. Once the list is obtained, the requesting peer probes the actual QoS to each peer on the list, and returns these results to the connection server. P2P probing in parallel using a modified packet-pair scheme is utilized. If anomalous results are obtained, a hop-by-hop probing scheme is utilized to determine the QoS of each link. In such a scheme, differential destination measurement is utilized. In the patent, Col 3 lines (1-67) and Col 5 lines (1-8) disclose that in the QoS analysis and prediction phase, the Connection Server (CS) utilizes the QoS information collected by and sent to it from the peers in the first phase. The CS groups related information from various peers, obtains discriminative features which can characterize the critical path QoS parameters, and extracts useful temporal information from the data. This information is then combined to obtain the statistical model of the network. This information is used once a request is received from a peer to return a list of suitable peers for connection thereto. The model described in the patent thus is limited to a designation of particular peers for sharing contents.
Similarly, patent application US20060215575, entitled “System and method for monitoring and reacting to peer-to-peer network metrics,” discloses that the overall health of a peer-to-peer network may be inferred from statistics gathered and analyzed pertaining to individual node and node-to-node performance within the peer-to-peer network. When used with simulations for development or testing, the health statistic may be used instead of or to supplement standard regression testing to determine whether or not changes made improve system performance. When used with live peer-to-peer networks, the health statistic may provide a real-time view into network performance. Such a view may be used to adjust peer-to-peer network topology or to isolate underperforming or malicious nodes. The model described in the patent thus still not offers a practical solution.
Patent application US20050152364, entitled “Traffic control system of P2P network,” provides a traffic control system which adapts P2P traffic to the circuit capacity and topology of a physical network. In a traffic control portion, a correspondent node identifying portion monitors the header of a packet exchanged by each P2P traffic and identifies the attribute of a correspondent node. A monitoring object extracting portion extracts a P2P connection based on the identification result. A traffic amount measuring portion obtains the total amount of each P2P traffic which is a monitoring object. A connection selecting portion selects a connection which should be shut down based on the total traffic amount. A filter portion shuts down the shut-down object connection selected by the connection selecting portion of the traffic monitoring portion.
Issued U.S. Pat. No. 7,443,803B2, entitled “Estimating and managing network traffic,” discloses that network traffic may be estimated using samples of network activities to identify traffic parameters. A model of the network may be generated using the traffic parameters, and the model can be used to simulate the network using a modified set of parameters. Network traffic may be managed using the results of the simulation. Network traffic can also be managed by intercepting and modifying a control message on a peer-to-peer network.
Patent application US20080259793, entitled “Network traffic control in peer-to-peer environments,” discloses a method and an electronic unit for controlling traffic on a network, especially for controlling peer-to-peer related traffic. A filter unit is intercepting messages related to peer-to-peer application from a network line, irrespective of the messages' destination, a control logic then manages a request represented by an intercepted message subject to its content and subject to peering specific information.
None of these documents (or existing P2P software) disclose efficient content management methods or systems. As such, there is a need for a method providing content management via P2P networks.