The present invention relates to an infrastructure design for handling high-capacity wide-area uploads over the Internet or similar networks. More particularly, the present invention relates to a technique for allowing many clients to send data intended for a common destination server at about the same without overloading the common destination server and its link to the Internet or similar network. Further, a particular client having a slow Internet or other network connection can get credit for data in advance of transferring the complete data to a destination server.
Tremendous increases in network traffic, often called “hot spots,” are a major obstacle to achieving scalability in the Internet and other large networks. At the application layer, hot spots are usually caused by either (a) high demand for some data or (b) high demand for a certain service. This high demand for data or services, is typically the result of a real-life event involving availability of new data or approaching deadlines. Therefore, relief of these hot spots improves interaction with the network, remote servers, and, more generally, quality of life. At the application layer, hot spot problems have traditionally been dealt with using some combination of (1) increasing capacity and (2) spreading the load over time, space, or both. Some examples of these are data replication (e.g., web caching, ftp mirroring), data replacement (e.g., multi-resolution images, audio, video), service replication (e.g., DNS lookup, Network Time Protocol), and server push (e.g., news download, software distribution).
The classes of solutions stated above have been studied mostly in the context of applications using the following types of communication (a) one-to-many (data travels primarily from a server to multiple clients, e.g., web download, software distribution, and video-on-demand), (b) many-to-many (data travels between multiple clients, through either a centralized or a distributed server, e.g., chat rooms and video conferencing), and (c) one-to-one (data travels between two clients, e g, e-mail and e-talk).
Hot spots in download applications mostly result from a demand for popular data objects. In contrast, hot spots in upload applications mainly result from a demand for a popular service, e.g., the income tax submission service, as the actual data being transferred by the various users is distinct.
There are two main characteristics which make upload applications different from download applications. First, in the case of uploads, the real-life event (e.g., deadline for filing taxes) which often causes the hot spots imposes a hard deadline on the data transfer service, whereas in the case of downloads, the real-life event (e.g., an important new Supreme Court opinion) translates into a desire for low latency (i.e., immediate or almost immediate) data access. Second, uploads are inherently data writing applications, whereas downloads are data reading applications. Traditional solutions aimed at latency reduction for data reading applications are (a) data replications (using a variety of techniques such as caching, prefetchmg, mirroring, etc) and (b) data replacement (such as sending a low resolution version of the data for image, video, audio downloads). Clearly, these techniques are, not applicable in uploads.
There is an unmet need for solutions to preventing (i.e., minimizing the likelihood of) overloads (server saturation or saturation of links connecting to a server) that result from the hot spots in Internet or similar network transfers involving many-to-one communication. Scalability and efficiency has not generally been achieved in this many-to-one context. Existing solutions, such as web based submissions, simply use many independent one-to-one transfers.
The many-to-one situation corresponds to an important class of applications, whose examples include the various upload applications such as submission of income tax forms, conference paper submission, proposal submission through the National Science Foundation (NSF) FastLane system, homework and project submissions in distance education, voting in digital democracy applications, voting in interactive television, Internet-based storage, among others.
In the current state of upload applications, and with reference to prior art FIG. 1 showing a simplified network structure, a specific upload flow, to a common destination server 10 from a plurality of clients 12 (for ease of illustration only some of the clients are numbered), can experience the following potential bottlenecks (or hot spots):
(1) poor connectivity of the client: some link in the Internet routes 14 between that client and the final destination is the bottleneck of the upload process (including the immediate link that connects the client to the Internet),
(2) overload on the server link: the server link (i.e., the part of route 14 closest to the destination server 10) that connects the server 10 to the Internet is overloaded due to too many simultaneous uploads to that server, and this link is the bottleneck of the upload process,
(3) overload on the server: the destination server 10 itself is overloaded due to too many simultaneous uploads to that server, and the server is the bottleneck of the upload process.
Given these bottlenecks, there are several traditional solutions (or a combination of these solutions) that one could consider                get a bigger server, for example, buy a bigger cluster of workstations to act as the upload server, which is intended to address problem (3) above,        buy a “bigger pipe” (i.e., get a link with greater capacity), that is improve the server's connectivity to the Internet, which is intended to address problem (2) above,        co-locate the server(s) at the Internet service provider(s) (“ISPs”) make arrangements directly with the ISPs to provide upload service at their locations, which is intended to solve problems (1) and (2) above (as well as problem (3) if this service is replicated at multiple ISP's).        
These solutions have a number of shortcomings, including lack of flexibility and lack of scalability. For instance, buying a bigger cluster for a “one time event” (which may not be “big enough” for the next similar event) is not the most desirable or flexible solution to the upload problem. Moreover, security concerns may limit one's ability to co-locate servers at multiple ISPs.