Transmission of files and streams between a sender and a recipient over a communications channel has been the subject of much literature. Preferably, a recipient desires to receive an exact copy of data transmitted over a channel by a sender with some level of certainty. Where the channel does not have perfect fidelity (which covers most all physically realizable systems), one concern is how to deal with data lost or garbled in transmission. Lost data (erasures) are often easier to deal with than corrupted data (errors)
because the recipient cannot always tell when corrupted data is data received in error. Many error-correcting codes have been developed to correct for erasures and/or for errors. Typically, the particular code used is chosen based on some information about the infidelities of the channel through which the data is being transmitted and the nature of the data being transmitted. For example, where the channel is known to have long periods of infidelity, a burst error code might be best suited for that application. Where only short, infrequent errors are expected a simple parity code might be best.
Data transmission is straightforward when a transmitter and a receiver have all of the computing power and electrical power needed for communications and the channel between the transmitter and receiver is clean enough to allow for relatively error-free communications. The problem of data transmission becomes more difficult when the channel is in an adverse environment or the transmitter and/or receiver has limited capability.
One solution is the use of forward error correcting (FEC) techniques, wherein data is coded at the transmitter such that a receiver can recover from transmission erasures and errors. Where feasible, a reverse channel from the receiver to the transmitter allows for the receiver to communicate about errors to the transmitter, which can then adjust its transmission process accordingly. Often, however, a reverse channel is not available or feasible or is available only with limited capacity. For example, where the transmitter is transmitting to a large number of receivers, the transmitter might not be able to handle reverse channels from all those receivers. As another example, the communication channel may be a storage medium and thus the transmission of the data is forward through time and, unless someone invents a time travel machine that can go back in time, a reverse channel for this channel is infeasible. As a result, communication protocols often need to be designed without a reverse channel or with a limited capacity reverse channel and, as such, the transmitter may have to deal with widely varying channel conditions without a full view of those channel conditions.
The problem of data transmission between transmitters and receivers is made more difficult when the receivers need to be low-power, small devices that might be portable or mobile and need to receive data at high bandwidths. For example, a wireless network might be set up to deliver files or streams from a stationary transmitter to a large or indeterminate number of portable or mobile receivers either as a broadcast or multicast where the receivers are constrained in their computing power, memory size, available electrical power, antenna size, device size and other design constraints. Another example is in storage applications where the receiver retrieves data from a storage medium which exhibits infidelities in reproduction of the original data. Such receivers are often embedded with the storage medium itself in devices, for example disk drives, which are highly constrained in terms of computing power and electrical power.
In such a system, considerations to be addressed include having little or no reverse channel, limited memory, limited computing cycles, power, mobility and timing.
In the case of a packet protocol used for data transport over a channel that can lose packets, a file, stream or other block of data to be transmitted over a packet network is partitioned into equal size input symbols, encoding symbols the same size as the input symbols are generated from the input symbols using an FEC code, and the encoding symbols are placed and sent in packets. The “size” of a symbol can be measured in bits, whether or not the symbol is actually broken into a bit stream, where a symbol has a size of M bits when the symbol is selected from an alphabet of 2M symbols. In such a packet-based communication system, a packet oriented erasure FEC coding scheme might be suitable. A file transmission is called reliable if it allows the intended recipient to recover an exact copy of the original file even in the face of erasures in the network. A stream transmission is called reliable if it allows the intended recipient to recover an exact copy of each part of the stream in a timely manner even in the face of erasures in the network. Both file transmission and stream transmission can also be somewhat reliable, in the sense that some parts of the file or stream are not recoverable or for streaming if some parts of the stream are not recoverable in a timely fashion. Packet loss often occurs because sporadic congestion causes the buffering mechanism in a router to reach its capacity, forcing it to drop incoming packets. Protection against erasures during transport has been the subject of much study.
In the case of a protocol used for data transmission over a noisy channel that can corrupt bits, a block of data to be transmitted over a data transmission channel is partitioned into equal size input symbols, encoding symbols of the same size are generated from the input symbols and the encoding symbols are sent over the channel. For such a noisy channel the size of a symbol is typically one bit or a few bits, whether or not a symbol is actually broken into a bit stream. In such a communication system, a bit-stream oriented error-correction FEC coding scheme might be suitable. A data transmission is called reliable if it allows the intended recipient to recover an exact copy of the original block even in the face of errors (symbol corruption, either detected or undetected in the channel). The transmission can also be somewhat reliable, in the sense that some parts of the block may remain corrupted after recovery. Symbols are often corrupted by sporadic noise, periodic noise, interference, weak signal, blockages in the channel, and a variety of other causes.
Chain reaction codes are FEC codes that allow for generation of an arbitrary number of output symbols from the fixed input symbols of a file or stream. Sometimes, they are referred to as fountain or rateless FEC codes, since the code does not have an a-priori fixed transmission rate and the number of possible output symbols can be independent of the number of input symbols. Novel techniques for generating, using and operating chain reaction codes are shown, for example, in Luby and Shokrollahi.
It is also known to use multi-stage chain reaction (“MSCR”) codes, such as those described in Shokrollahi and developed by Digital Fountain, Inc. under the trade name “Raptor” codes. Multi-stage chain reaction codes are used, for example, in an encoder that receives input symbols from a source file or source stream, generates intermediate symbols from the input symbols and the intermediate symbols are the source symbols for a chain reaction encoder.
For some applications, other variations of codes might be more suitable or otherwise preferred. As used herein, input symbols refer to the data received from a file or stream and source symbols refer to the symbols that are used to generate output symbols. In some cases, the source symbols include the input symbols and in some cases, the source symbols are the input symbols. However, there are cases where the input symbols are encoded and/or transformed into an intermediate set of symbols and that intermediate set is used to generate the output symbols without reference to the input symbols (directly). Thus, input symbols comprise information known to the sender which is to be communicated to the receiver, source symbols are the symbols used by at least one stage of an encoder and are derived from the input symbols, and output symbols comprise symbols that are transmitted by the sender to the receiver.
In some applications, the receiver may begin to use the data before the transmission is complete. For example, with a video-on-demand system, the receiver might start playing out a video after only a small portion of the video data is received and assume that the rest of the video data will be received before it is needed. In such systems, encoding should not be done over the entire transmission, because then some output symbols at the end of the transmission might encode for input symbols needed at the beginning of the video, in which case those output symbols are wasteful since their information is needed when it is not available and is not needed when it is available. To avoid this, the data stream is typically divided into blocks wherein the input data of the block is encoded and sent before the next block is prepared and blocks normally do not depend on input symbols outside those blocks.
There are tradeoffs with the use of blocks: too small a block size and not enough error protection is provided, whereas too large a block size and too much delay is seen at the receiver as it waits for blocks to be completely recovered.