Digital file transmission between a server and multiple receivers over a communications channel has been the subject of much literature. In general, a design goal of a file transmission system is to allow each recipient to receive an exact copy of data transmitted over a channel by a server with some level of certainty. A file transmission system may have to serve as many different files as there are active receivers, as each receiver may demand a different file. In addition, where different receivers request the same file at different points in time, a concern is how to efficiently serve the file to each receiver. Potentially each client may require an independent stream of the file it requested, where a stream is the flow of data from the server required by that client in order to download the file.
The file transmission systems that have been proposed in the literature can be divided into two distinct classes: (1) user-centered and (2) data-centered. In user-centered strategies, the bandwidth available at the server to serve a file is allocated according to client requests, i.e., the bandwidth assigned to serve a particular file can vary over time depending on how many clients are requesting that file. In data-centered strategies, the bandwidth available at the server is allocated among the different files, i.e., the bandwidth assigned to serve a particular file is independent of whether one or a million clients are requesting that file.
For a user-centered strategy, the server bandwidth requirement for a particular file can be expected to grow with the frequency of user requests for that file. This may be acceptable for a small number of users, but may be infeasible if the number of users grows very large for very popular files. For example, a typical user-centered strategy is the Transport Control Protocol (“TCP”). TCP is a point-to-point packet control scheme where a file or a data stream is partitioned into input symbols, input symbols are placed into consecutive packets, and a server transmits ordered packets across the channel and the recipient acknowledges receipt of each packet. If a packet is lost, or no acknowledgment is received at the server, the server will resend the packet. A TCP server therefore should maintain state as to which packets have been sent and which packets have been acknowledged as received by each client.
Some work proposes using broadcast or multicast mechanisms in order for a file transmission system to be scalable to a large number of clients. However, acknowledgment-based protocols like TCP do not scale well to broadcasting. For example, a sender broadcasting a file to multiple recipients requires a back channel from each recipient to the sender for acknowledgment data (either positive or negative), and should be powerful enough to be able to handle all of the acknowledgment data properly. Another drawback is that if different recipients lose different sets of packets, rebroadcast of packets missed by only a few of the recipients causes reception of useless duplicate packets by other recipients. Additionally, acknowledgment-based communication systems do not easily permit recipients to begin receiving a file asynchronous to the beginning of the broadcast, i.e., permit a recipient to begin receiving data in the middle of a transmission session.
Data-centered strategies using broadcast or multicast mechanisms are scalable to potentially millions of users as, unlike user-centered strategies, the server bandwidth required to serve a single file is independent of the number of user requests, or the frequency of user requests. A simple data-centered strategy that is sometimes used in practice is a carousel-based protocol. A carousel protocol partitions an input file into equal length input symbols, places each input symbol into a packet, and then continually cycles through and transmits all the packets. A major drawback with a carousel-based protocol is that if a recipient misses even one packet, then the recipient must wait another entire cycle before having a chance at receiving the missed packet, i.e., a carousel-based protocol can cause a large amount of duplicate data reception
One approach to deal with data lost in transmission is to use erasure correcting codes such as Reed-Solomon Codes or Tornado Codes to increase reliability. One feature of several erasure correcting codes is that, when a file is partitioned into input symbols that are sent in packets to the recipient, the recipient can decode the packets to reconstruct the entire file once sufficiently many packets are received, generally regardless of which packets arrive. This property removes the need for acknowledgments at the packet level, since the file can be recovered even if packets are lost.
Erasure correcting codes, such as Reed-Solomon or Tornado codes generate a fixed number of output symbols for a fixed input file. These output symbols may comprise the K original input symbols and N-K redundant symbols. If storage permits, then the server can compute the set of output symbols for each file only once and transmit the output symbols using the carousel protocol above.
More recently, chain reaction coding systems have been developed for use in file transmission systems. U.S. Pat. No. 6,307,487 (U.S. patent application Ser. No. 09/246,015, filed Feb. 5, 1999 and entitled “Information Additive Code Generator And Decoder For Communication Systems”), U.S. Pat. No. 6,320,520 (U.S. patent application Ser. No. 09/399,201, filed Sep. 17, 1999 and entitled “Information Additive Group Code Generator And Decoder For Communication Systems”, U.S. Pat. No. 6,486,803 (U.S. patent application Ser. No. 09/668,452, filed Sep. 22, 2000 and entitled “On Demand Encoding With a Window”), and U.S. Pat. No. 6,411,223 (U.S. patent application Ser. No. 09/691,735, filed Oct. 18, 2000 and entitled “Generating High Weight Encoding Symbols Using a Basis”) describe various chain reaction coding systems in detail. As described therein, a chain reaction encoder generates output symbols from input symbols of the input file as needed. The server is continuously generating output symbols for each file being served. Therefore, what is needed is a server that does not require excessive computing power or memory at a sender to implement, and that can be used to efficiently distribute a plurality of files that are continuously being encoded.