Transmission of files and streams between a sender and a recipient over a communications channel has been the subject of much literature. Preferably, a recipient desires to receive an exact copy of data transmitted over a channel by a sender with some level of certainty. Where the channel does not have perfect fidelity, which characterizes most physically realizable systems, one concern is how to deal with data that is lost or corrupted in transmission. Lost data (erasures) are often easier to deal with than corrupted data (errors) because the recipient cannot always recognize when the transmitted data has been corrupted.
Many error-correcting codes have been developed to correct erasures and/or errors. Typically, the particular code used is chosen based on some information about the infidelities of the channel through which the data is being transmitted, and the nature of the data being transmitted. For example, where the channel is known to have long periods of infidelity, a burst error code might be best suited for that application. Where only short, infrequent errors are expected, a simple parity code might be best.
“Communication,” as used herein, refers to data transmission, through space and/or time, such as data transmitted from one location to another or data stored at one time and used at another. The channel is that which separates the sender and receiver. Channels in space can be wires, networks, fibers, wireless media, etc. between a sender and receiver. Channels in time can be data storage devices. In realizable channels, there is often a nonzero chance that the data sent or stored by the sender is different when it is received or read by the recipient and those differences might be due to errors introduced in the channel.
Data transmission is straightforward when a transmitter and a receiver have all of the computing power and electrical power needed for communications, and the channel between the transmitter and receiver is reliable enough to allow for relatively error-free communications. Data transmission becomes more difficult when the channel is in an adverse environment, or the transmitter and/or receiver has limited capability. In certain applications, uninterrupted error-free communication is required over long periods of time. For example, in digital television systems it is expected that transmissions will be received error-free for periods of many hours at a time. In these cases, the problem of data transmission is difficult even in conditions of relatively low levels of errors.
Another scenario in which data communication is difficult is where a single transmission is directed to multiple receivers that may experience widely different data loss conditions. Furthermore, the conditions experienced by one given receiver may vary widely or may be relatively constant over time.
One solution to dealing with data loss (errors and/or erasures) is the use of forward error correcting (FEC) techniques, wherein data is coded at the transmitter in such a way that a receiver can correct transmission erasures and errors. Where feasible, a reverse channel from the receiver to the transmitter enables the receiver to relay information about these errors to the transmitter, which can then adjust its transmission process accordingly. Often, however, a reverse channel is not available or feasible, or is available only with limited capacity. For example, in cases in which the transmitter is transmitting to a large number of receivers, the transmitter might not be able to maintain reverse channels from all the receivers. In another example, the communication channel may be a storage medium.
For example, data may be transmitted chronologically forward through time, and causality precludes a reverse channel that can fix errors before they happen. As a result, communication protocols often need to be designed without a reverse channel or with a limited capacity reverse channel and, as such, the transmitter may have to deal with widely varying channel conditions without prior knowledge of those channel conditions. One example is a broadcast or multicast channel, where reverse communication is not provided, or if provided is very limited or expensive. Another example where such a situation is relevant is a storage application, where the data is stored encoded using FEC, and then at a later point of time, the data is recovered, possibly using FEC decoding.
In the case of a packet protocol used for data transport over a channel that can lose packets, a file, stream, or other block of data to be transmitted over a packet network is partitioned into source symbols (that may all be of equal size or that may vary in size depending on the block size or on other factors). Encoding symbols are generated from the source symbols using an FEC code, and the encoding symbols are placed and sent in packets. The “size” of a symbol can be measured in bits, whether or not the symbol is actually broken into a bit stream, where a symbol has a size of M bits when the symbol is selected from an alphabet of 2M symbols. In such a packet-based communication system, a packet-oriented erasure FEC coding scheme might be suitable.
A file transmission is called reliable if it enables the intended recipient to recover an exact copy of the original file despite erasures and/or other corruption of the data transmitted over a network. A stream transmission is called reliable if it enables the intended recipient to recover an exact copy of each part of the stream in a timely manner despite erasures and/or corruption within the network. Both file transmission and stream transmission can instead be not entirely reliable, but somewhat reliable, in the sense that some parts of the file or stream are not recoverable or, for streaming, some parts of the stream might be recoverable but not in a timely fashion. It is often a goal to provide as high reliability as possible depending on some constraining conditions, where examples of constraints might be timely delivery for streaming applications, or the type of network conditions over which a solution is expected to operate.
Packet loss often occurs because sporadic congestion causes the buffering mechanism in a router to reach its capacity, forcing it to drop incoming packets. Other causes of packet loss include weak signal, intermittent signal, and noise interference wherein corrupted packets are discarded. Protection against erasures during transport has been the subject of much study.
In a system in which a single transmission is directed to more than one receiver, and in which different receivers experience widely different conditions, transmissions are often configured for some set of conditions between the transmitter and any receiver, and any receivers that are in worse conditions may not receive the transmission reliably.
Erasure codes are known which provide excellent recovery of lost packets in such scenarios. For example, Reed-Solomon codes are well known and can be adapted to this purpose. However, a known disadvantage of Reed-Solomon codes is their relatively high computational complexity. Chain reaction codes, including LT™ chain reaction codes and Raptor™ multi-stage chain reaction (“MSCR”) codes, provide excellent recovery of lost packets, and are highly adaptable to varying channel conditions. For example, Shokrollahi describes aspects of multi-stage chain reaction codes. Herein, the term “chain reaction code” should be understood to include chain reaction codes or multi-stage chain reaction codes, unless otherwise indicated.
In some cases, it may be necessary or desirable to increase the reliability of a communications system after deployment. However, while an improvement in network reliability may be needed, it is typically not feasible to replace or upgrade all receiving devices in the network at once or at all. For example, it might turn out that actual network packet loss is higher than initially planned, due to degradations in network reliability, increased traffic load, expansions and/or changes in the network, etc., or the quality of service requirements may need to increase to match competitive services, but it might be impractical to get new receivers out to all nodes of the communications system at once or to distribute them over time and have some receiving stations out of commission until the new receivers arrive.
In order to deliver the best possible service at the lowest cost, communications systems must simultaneously balance conflicting resource constraints. Network bandwidth is a critical resource constraint. Transmitting and receiving devices need to enable efficient use of network bandwidth in supporting a reliable service. The available CPU processing on receiving devices is typically a severe limitation, meaning that any transport reliability enhancement method must require only a modest amount of computing effort. In addition, it is also often necessary, particularly with streaming media, to limit the incremental latency associated with reliable transport methods so that the end-user does not perceive a reduction in system responsiveness.