Transmission of files and streams between a sender and a recipient over a communications channel has been the subject of much literature. Preferably, a recipient desires to receive an exact copy of data transmitted over a channel by a sender with some level of certainty. Where the channel does not have perfect fidelity, which characterizes most physically realizable systems, one concern is how to deal with data that is lost or corrupted in transmission. Lost data (erasures) are often easier to deal with than corrupted data (errors) because the recipient cannot always recognize when the transmitted data has been corrupted.
Many error-correcting codes have been developed to correct erasures and/or errors. Typically, the particular code used is chosen based on some information about the infidelities of the channel through which the data is being transmitted, and the nature of the data being transmitted. For example, where the channel is known to have long periods of infidelity, a burst error code might be best suited for that application. Where only short, infrequent errors are expected, a simple parity code might be best.
“Communication”, as used herein, refers to data transmission, through space and/or time, such as data transmitted from one location to another or data stored at one time and used at another. The channel is that which separates the sender and receiver. Channels in space can be wires, networks, fibers, wireless media, etc. between a sender and receiver. Channels in time can be data storage devices. In realizable channels, there is often a nonzero chance that the data sent or stored by the sender is different when it is received or read by the recipient and those differences might be due to errors introduced in the channel.
Data transmission is straightforward when a transmitter and a receiver have all of the computing power and electrical power needed for communications, and the channel between the transmitter and receiver is reliable enough to allow for relatively error-free communications. Data transmission becomes more difficult when the channel is in an adverse environment, or the transmitter and/or receiver has limited capability. In certain applications, uninterrupted error-free communication is required over long periods of time. For example, in digital television systems it is expected that transmissions will be received error-free for periods of many hours at a time. In these cases, the problem of data transmission is difficult even in conditions of relatively low levels of errors.
Another scenario in which data communication is difficult is where a single transmission is directed to multiple receivers that may experience widely different data loss conditions. Furthermore, the conditions experienced by one given receiver may vary widely or may be relatively constant over time.
One solution to dealing with date loss (errors and/or erasures) is the use of forward error correcting (FEC) techniques, wherein data is coded at the transmitter in such a way that a receiver can correct transmission erasures and errors. Where feasible, a reverse channel from the receiver to the transmitter allows for the receiver to relay information about these errors to the transmitter, which can then adjust its transmission process accordingly. Often, however, a reverse channel is not available or feasible, or is available only with limited capacity. For example, in cases in which the transmitter is transmitting to a large number of receivers, the transmitter might not be able to maintain reverse channels from all the receivers. In another example, the communication channel may be a storage medium.
Thus, data is transmitted chronologically forward through time, and causality precludes a reverse channel that can fix errors before they happen. As a result, communication protocols often need to be designed without a reverse channel or with a limited capacity reverse channel and, as such, the transmitter may have to deal with widely varying channel conditions without prior knowledge of those channel conditions.
In the case of a packet protocol used for data transport over a channel that can lose packets, a file, stream, or other block of data to be transmitted over a packet network is partitioned into equally-sized source symbols. Encoding symbols the same size as the source symbols are generated from the source symbols using an FEC code, and the encoding symbols are placed and sent in packets. The “size” of a symbol can be measured in bits, whether or not the symbol is actually broken into a bit stream, where a symbol has a size of M bits when the symbol is selected from an alphabet of 2M symbols. In such a packet-based communication system, a packet oriented erasure FEC coding scheme might be suitable.
A file transmission is called reliable if it allows the intended recipient to recover an exact copy of the original file despite erasures in the network. A stream transmission is called reliable if it allows the intended recipient to recover an exact copy of each part of the stream in a timely manner despite erasures in the network. Both file transmission and stream transmission can instead be not entirely reliable, but somewhat reliable, in the sense that some parts of the file or stream are not recoverable or, for streaming, some parts of the stream might be recoverable but not in a timely fashion.
Packet loss often occurs because sporadic congestion causes the buffering mechanism in a router to reach its capacity, forcing it to drop incoming packets. Protection against erasures during transport has been the subject of much study.
In a system in which a single transmission is directed to more than one receiver, and in which different receivers experience widely different conditions, transmissions must be configured for the worst conditions between the transmitter and any receiver, i.e., it must be assumed that some receivers will not receive the transmission reliably.
Erasure codes are known which provide excellent recovery of lost packets in such scenarios. For example, Reed-Solomon codes are well known and can be adapted to this purpose. However, a known disadvantage of Reed-Solomon codes is their relatively high computational complexity. Chain reaction codes, including LT™ chain reaction codes and Raptor™ multi-stage chain reaction (“MSCR”) codes, provide excellent recovery of lost packets, and are highly adaptable to varying channel conditions. See, for example, Luby I, which describes aspects of chain reaction codes, and Shokrollahi I, which describes aspects of multi-stage chain reaction codes. Herein, the term “chain reaction code” should be understood to include chain reaction codes or multi-stage chain reaction codes, unless otherwise indicated.
As a general rule, erasure codes that are capable of correcting large amounts of lost data have a greater cost in terms of computational complexity, device hardware and software complexity, and/or memory requirements than those codes which are designed only for very limited levels of errors. In particular, as is well known, a simple parity code can be used to correct a single lost symbol among a group of any given size. The complexity of encoding and decoding such a code is very low. Interleaved parity codes are well known as a technique for correcting bursts of lost symbols that are shorter than or equal to the interleave depth. Such codes also have very low encoding and decoding complexity.
A disadvantage of known techniques is that if losses for some receivers are such that a more powerful erasure correction code must be employed, then all receivers need to provide support for this powerful code, implying costs in terms of complexity, memory, etc. for all receivers. As a result, initial deployments of systems may be based on less complex and less powerful codes, with an objective of upgrading the system to more powerful codes as required.
The “flag day” problem is a well-known problem that is difficult to solve in practice. The flag day problem occurs in large communication systems involving many devices (senders and receivers) when it becomes necessary to upgrade all deployed devices (e.g., receivers) simultaneously in order to deploy an upgraded service. The flag day problem has delayed or prevented the implementation of a number of desirable upgrades to systems in cases in which the upgrade is characterized by the precondition that all devices need to be updated before the system will work correctly, i.e., the system will not even perform at pre-upgrade levels if some devices have been upgraded and other devices have not. It is therefore desirable that an upgrade path be designed to be more or less seamless, in that the system can function at least as well during the upgrade period as it did before the upgrade began, even when some of the devices have been upgraded and others have not. This can be a large problem if a full upgrade would take a prolonged period of time. In such cases, a seamless upgrade path is highly desirable but is often difficult to accomplish.