Mass distribution of digital content over a wide area network, such as the Internet, is commonly implemented using one or more sources (e.g., servers) where such data are available, and some number of destination sites (e.g., clients) to which copies of the data are transmitted and stored. Traditional methods of distributing digital content typically fall into two regimes: serial and parallel. In the serial distribution method, digital content files are distributed from source to destination in a single call or push. The serial method is optimal when only one client is requesting an update, or when an update is called by or pushed to a single client in the network. However, in most other circumstances, parallel distribution methods may be used to reduce the time needed to distribute files to multiple destination sites in a network.
In one method of parallel distribution, a processing element of the network, referred to herein as a “node,” may be designed to selectively act as a server and/or a client. File distribution workload may be shared in such a way that digital content may be distributed among various source nodes which, in turn, may distribute copies of the digital content files in parallel to various destination nodes in the network (e.g., the server paradigm). Optimal load balancing of file distribution responsibility among many network nodes remains an area of significant research.
As a matter of definition, consider a network of n nodes, where n is at least two. Denote the single inter-nodal bandwidths by vi,j, i≠j, where i,j are the ith and jth nodes of the network, respectively. In common content distribution schemes as described above, digital content to be distributed exists typically in one location. If not, the situation may be decomposed (e.g., divide a network logically into subnetworks) to create the manageable scenario where the content to be distributed exists in one location.
Also as a matter of definition, consider the problem of distributing content of generally large size among targeted nodes of a network from a single source location. In a non-trivial network distribution scheme, where the size of the content to be distributed is large when compared to the single inter-nodal bandwidth, the time τserial required to distribute the content with size csize from the source node i=α0 in the serial distribution paradigm, assuming minimal variance in inter-nodal bandwidth over time, may be modeled as follows:
      τ    serial    =            c              s        ⁢        i        ⁢        z        ⁢        e              ⁢                  ∑                              j            =            1                    ,                      j            ≠            α                                    n          -          1                    ⁢              1                  v                                    α              0                        ,            j                              
Consider next the strict parallel distribution paradigm, again assuming the network is non-trivial in the parallel distribution case (i.e., that the single inter-nodal bandwidths are affected when distributing the content in question in parallel). Note: The rationale for the aforementioned assumption is that, if the network is trivial in the parallel distribution case, then the optimal distribution of content occurs when the file is distributed in parallel. In other words, the size of the digital content is not large when compared to the single inter-nodal bandwidth, which contradicts the operating assumption above. Denote the new single inter-nodal bandwidths, when burdened by the file distribution, by v′i,j(t). Under the above assumptions, the time required to distribute content with size csize from the source i=α to the nodes in the strict parallel distribution paradigm (one source) is, at best, modeled as follows:
      τ    parallel    =            c      size        *          1                        min          j                ⁢                  (                                    v                                                α                  0                                ,                j                            ′                        ⁡                          (              t              )                                )                    where (v′α0,j(t)) is the network bottleneck. Note that v′ represents a function of time, as v′ will shift over time as the load on the source node decreases.
The best-case scenario above uses the assumption that the source node has adequate computational and network resources to distribute the content to each destination node without affecting inter-nodal bandwidths. Most of the time, however, this is not the case in practice. However, if this scenario turns out to not be the case, a common method is to generate new distribution subnetworks with sources and nodes that do satisfy the relation modeled above.