Transmission of files and streams between a sender and a recipient over a communications channel has been the subject of much literature. Preferably, a recipient desires to receive an exact copy of data transmitted over a channel by a sender with some level of certainty. Where the channel does not have perfect fidelity (which covers most all physically realizable systems), one concern is how to deal with data lost or garbled in transmission. Lost data (erasures) are often easier to deal with than corrupted data (errors) because the recipient cannot always tell when corrupted data is data received in error. Many error-correcting codes have been developed to correct for erasures and/or for errors.
Data transmission is straightforward when a transmitter and a receiver have all of the computing power and electrical power needed for communications and the channel between the transmitter and receiver is clean enough to allow for relatively error-free communications. The problem of data transmission becomes more difficult when the channel is in an adverse environment or the transmitter and/or receiver has limited capability.
One solution is the use of forward error correcting (FEC) techniques, wherein data is coded at the transmitter such that a receiver can recover from transmission erasures and errors. Where feasible, a reverse channel from the receiver to the transmitter allows for the receiver to communicate about errors to the transmitter, which can then adjust its transmission process accordingly. Often, however, a reverse channel is not available or feasible or is available only with limited capacity. For example, where the transmitter is transmitting to a large number of receivers, the transmitter might not be able to handle reverse channels from all those receivers. As a result, communication protocols often need to be designed without a reverse channel or with a limited capacity reverse channel and, as such, the transmitter may have to deal with widely varying channel conditions without a full view of those channel conditions.
In the case of a packet protocol used for data transport over a channel that can lose packets, a file, stream or other block of data to be transmitted over a packet network is partitioned into equal size input symbols, encoding symbols the same size as the input symbols are generated from the input symbols using an FEC code, and the encoding symbols are placed and sent in packets. The “size” of a symbol can be measured in bits, whether or not the symbol is actually broken into a bit stream, where a symbol has a size of M bits when the symbol is selected from an alphabet of 2M symbols. In such a packet-based communication system, a packet oriented erasure FEC coding scheme might be suitable. A file transmission is called reliable if it allows the intended recipient to recover an exact copy of the original file even in the face of erasures in the network. A stream transmission is called reliable if it allows the intended recipient to recover an exact copy of each part of the stream in a timely manner even in the face of erasures in the network. Both file transmission and stream transmission can also be somewhat reliable, in the sense that some parts of the file or stream are not recoverable or for streaming if some parts of the stream are not recoverable in a timely fashion. Packet loss often occurs because sporadic congestion causes the buffering mechanism in a router to reach its capacity, forcing it to drop incoming packets. Protection against erasures during transport has been the subject of much study.
In the case of a protocol used for data transmission over a noisy channel that can corrupt bits, a block of data to be transmitted over a data transmission channel is partitioned into equal size input symbols, encoding symbols of the same size are generated from the input symbols and the encoding symbols are sent over the channel. For such a noisy channel the size of a symbol is typically one bit or a few bits, whether or not a symbol is actually broken into a bit stream. In such a communication system, a bit-stream oriented error-correction FEC coding scheme might be suitable. A data transmission is called reliable if it allows the intended recipient to recover an exact copy of the original block even in the face of errors (symbol corruption, either detected or undetected in the channel). The transmission can also be somewhat reliable, in the sense that some parts of the block may remain corrupted after recovery. Symbols are often corrupted by sporadic noise, periodic noise, interference, weak signal, blockages in the channel, and a variety of other causes.
One problem with some FEC codes is that they require excessive computing power or memory to operate. Another problem is that the number of output symbols must be determined in advance of the coding process. This can lead to inefficiencies if the loss rate of packets is overestimated, and can lead to failure if the loss rate of packets is underestimated.
Chain reaction codes are FEC codes that allow for generation of an arbitrary number of output symbols from the fixed input symbols of a file or stream. Sometimes, they are referred to as fountain or rateless FEC codes, since the code does not have an a-priori fixed transmission rate and the number of possible output symbols can be independent of the number of input symbols. Novel techniques for generating, using and operating chain reaction codes are shown, for example, in Luby and Shokrollahi.
It is also known to use multi-stage chain reaction (“MSCR”) codes, such as those described in Shokrollahi and developed by Digital Fountain, Inc. under the trade name “Raptor” codes. Multi-stage chain reaction codes are used, for example, in an encoder that receives input symbols from a source file or source stream, generates intermediate symbols from the input symbols and the intermediate symbols are the source symbols for a chain reaction encoder.
For some applications, other variations of codes might be more suitable or otherwise preferred. As used herein, input symbols refer to the data received from a file or stream and source symbols refer to the symbols that are used to generate output symbols. In some cases, the source symbols include the input symbols and in some cases, the source symbols are the input symbols. However, there are cases where the input symbols are encoded and/or transformed into an intermediate set of symbols and that intermediate set is used to generate the output symbols without reference to the input symbols (directly). Thus, input symbols comprise information known to the sender which is to be communicated to the receiver, source symbols are the symbols used by at least one stage of an encoder and are derived from the input symbols, and output symbols comprise symbols that are transmitted by the sender to the receiver.
In some applications, the receiver may begin to use the data before the transmission is complete. For example, with a video-on-demand system, the receiver might start playing out a video after only a small portion of the video data is received and assume that the rest of the video data will be received before it is needed. In such systems, encoding should not be done over the entire transmission, because then some output symbols at the end of the transmission might encode for input symbols needed at the beginning of the video, in which case those output symbols are wasteful since their information is needed when it is not available and is not needed when it is available. To avoid this, the data stream is typically divided into blocks wherein the input data of the block is encoded and sent before the next block is prepared and blocks normally do not depend on input symbols outside those blocks.
For such applications, there is often a trade-off between reliability and lag time between when the transmission starts and when the data can start to be used. For example, if an entire feature length movie were encoded such that errors at the start of the transmission can be corrected using data at the end of the transmission, the receiver might wait until it receives all of the movie data before indicating to the application (or the user of the application) that the movie is available for playback. However, where the total transmission time is long, that can be an unacceptable lag time.
One solution is to encode a stream of data such that the receiver has enough information to begin playback of the movie after some smaller lag time and the receiver can expect to receive further information in time to continue the playback. Naturally, if the data near the end of the transmission provides redundancy for the data at the start of the transmission, that capability is wasted since the first part of the movie will have played back long before that later information is received. Thus, it is efficient to have the redundancy available when it is needed, typically close in time with the decoding of the data. However, if the constraints are too strict, playback might have to begin too early and raise the probability that the receiver hits a playback point in the movie where it does not yet have enough data to decode and would cause a skip or pause.
There are tradeoffs with the use of blocks: too small a block size and not enough error protection is provided, whereas too large a block size and too much delay is seen at the receiver as it waits for blocks to be completely recovered.