In current network environments, scenarios often arise in which many network clients want to access particular digital content at the same time. For example, consider a server on the Internet that has content such as an exclusive news report showing digital video footage of a newly erupting volcano. Millions of clients may like to receive this content over the Internet within a short period of time, either by downloading the content or by streaming and playing the content in real time. These surges in network traffic are sometimes called “flash crowds”, and they typically end up crashing the affected server. This is because the server has limited bandwidth that might be sufficient to service, for example, tens or maybe hundreds of nodes (i.e., clients), but not millions.
One solution to this problem is to form the server and clients into a peer-to-peer overlay network and to distribute the content using application layer multicast. In multicast, the server sends the content to a collection of nodes, each of which forwards the content to several other nodes, which in turn forward the content to several other nodes, and so on. A problem with peer-to-peer application layer multicast, however, is that the nodes are typically residential end-hosts. Residential end-hosts are unreliable compared to routers, and they do not have enough outgoing bandwidth to be able to forward the content to many other nodes. In addition, individual nodes have little incentive to forward the content to many other nodes. It is reasonable to assume, however, that each node has enough bandwidth and incentive to forward the content to one other node. This reduces the multicast distribution tree to a distribution “path”, and could be an acceptable solution if the nodes were reliable. However, when there are a million nodes, for example, and the server is sending content directly to only one hundred nodes (the server's children), then there are nodes that are getting the content through approximately ten thousand hops. Therefore, even if there is a small probability that any particular node fails or leaves the system, the probability that any one of the upstream nodes fails is significant.
Prior work in this area suggests that a node should get data from a small number of other nodes rather than from just a single parent node, and that it should send data to an equal number (or approximately equal number) of child nodes. Thus, each node has approximately equal input and output bandwidths, and far shorter paths from the server are allowed. Data can include erasure codes (e.g., Reed-Solomon codes) or multiple description codes so that it is not necessary for a node to receive data successfully from all its parents.
In one solution, a node joining a network contacts a server to get the IP addresses of a set of nodes (e.g., 40 nodes) already receiving content. From the set of nodes already receiving content, the node joining the network selects a plurality of nodes (e.g., 5 nodes) to connect to. The nodes exchange information concerning downloads so that each node can determine which packets to generated and send. This solution improves the robustness over the previous solution, but reliability still degrades as the network gets larger if the number of connections between a node and its parent nodes stays fixed. Moreover the building and maintenance of the overlay network can become complex if routing structures need to be maintained.
Accordingly, a need exists for a scalable and robust network that maintains reliability both as the number of nodes in the network grows and as nodes leave the network.