Techniques for transmission of files between a sender and a recipient over a communications channel are the subject of much literature. Preferably, a recipient desires to receive an exact copy of data transmitted over a channel by a sender with some level of certainty. Where the channel does not have perfect fidelity (which covers most all physically realizable systems), one concern is how to deal with data lost or garbled in transmission. Lost data (erasures) are often easier to deal with than corrupted data (errors) because the recipient cannot always tell when corrupted data is data received in error. Many error-correcting codes have been developed to correct for erasures and/or for errors. Typically, the particular code used is chosen based on some information about the infidelities of the channel through which the data is being transmitted and the nature of the data being transmitted. For example, where the channel is known to have long periods of infidelity, a burst error code might be best suited for that application. Where only short, infrequent errors are expected a simple parity code might be best.
As used herein, “source data” refers to data that is available at one or more senders and that a receiver is used to obtain, by recovery from a transmitted sequence with or without errors and/or erasures, etc. As used herein, “encoded data” refers to data that is conveyed and can be used to recover or obtain the source data. In a simple case, the encoded data is a copy of the source data, but if the received encoded data differs (due to errors and/or erasures) from the transmitted encoded data, in this simple case the source data might not be entirely recoverable absent additional data about the source data. Transmission can be through space or time. In a more complex case, the encoded data is generated based on source data in a transformation and is transmitted from one or more senders to receivers. The encoding is said to be “systematic” if the source data is found to be part of the encoded data. In a simple example of systematic encoding, redundant information about the source data is appended to the end of the source data to form the encoded data.
Also as used herein, “input data” refers to data that is present at an input of an FEC (forward-error correcting) encoder apparatus or an FEC encoder module, component, step, etc., (“FEC encoder”) and “output data” refers to data that is present at an output of an FEC encoder. Correspondingly, output data would be expected to be present at an input of an FEC decoder and the FEC decoder would be expected to output the input data, or a correspondence thereof, based on the output data it processed. In some cases, the input data is, or includes, the source data, and in some cases, the output data is, or includes, the encoded data. In other cases, a sender device or sender program code may comprise more than one FEC encoder, i.e., source data is transformed into encoded data in a series of a plurality of FEC encoders. Similarly at the receiver, there may be more than one FEC decoder applied to generate source data from received encoded data.
Data can be thought of as partitioned into symbols. An encoder is a computer system, device, electronic circuit, or the like, that generates encoded symbols or output symbols from a sequence of source symbols or input symbols and a decoder is the counterpart that recovers a sequence of source symbols or input symbols from received or recovered encoded symbols or output symbols. The encoder and decoder are separated in time and/or space by the channel and any received encoded symbols might not be exactly the same as corresponding transmitted encoded symbols and they might not be received in exactly the same sequence as they were transmitted. The “size” of a symbol can be measured in bits, whether or not the symbol is actually broken into a bit stream, where a symbol has a size of M bits when the symbol is selected from an alphabet of 2M symbols. In many of the examples herein, symbols are measured in bytes and codes might be over a field of 256 possibilities (there are 256 possible 8-bit patterns), but it should be understood that different units of data measurement can be used and it is well-known to measure data in various ways.
Luby I describes the use of codes, such as chain reaction codes, to address error correction in a compute-efficient, memory-efficient and bandwidth-efficient manner. One property of the encoded symbols produced by a chain reaction encoder is that a receiver is able to recover the original file as soon as enough encoded symbols have been received. Specifically, to recover the original K source symbols with a high probability, the receiver needs approximately K+A encoded symbols.
The “absolute reception overhead” for a given situation is represented by the value A, while a “relative reception overhead” can be calculated as the ratio A/K. The absolute reception overhead is a measure of how much extra data needs to be received beyond the information theoretic minimal amount of data, and it may depend on the reliability of the decoder and may vary as a function of the number, K, of source symbols. Similarly, the relative reception overhead, A/K, is a measure of how much extra data needs to be received beyond the information theoretic minimal amount of data relative to the size of the source data being recovered, and also may depend on the reliability of the decoder and may vary as a function of the number K of source symbols.
Chain reaction codes are extremely useful for communication over a packet based network. However, they can be fairly computationally intensive at times. A decoder might be able to decode more often, or more easily, if the source symbols are encoded using a static encoder prior to a dynamic encoder that encodes using a chain reaction or another rateless code. Such decoders are shown in Shokrollahi I, for example. In examples shown there, source symbols are input symbols to a static encoder that produces output symbols that are input symbols to a dynamic encoder that produces output symbols that are the encoded symbols, wherein the dynamic encoder is a rateless encoder that that can generate a number of output symbols in a quantity that is not a fixed rate relative to the number of input symbols. The static encoder might include more than one fixed rate encoder. For example a static encoder might include a Hamming encoder, a low-density parity-check (“LDPC”) encoder, a high-density parity-check (“HDPC”) encoder, and/or the like.
Chain reaction codes have a property that as some symbols are recovered at the decoder from the received symbols, those symbols might be able to be used to recover additional symbols, which in turn might be used to recover yet more symbols. Preferably, the chain reaction of symbol solving at the decoder can continue such that all of the desired symbols are recovered before the pool of received symbols is used up. Preferably, the computational complexity of performing chain reaction encoding and decoding processes is low.
A recovery process at the decoder might involve determining which symbols were received, creating a matrix that would map the original input symbols to those encoded symbols that were received, then inverting the matrix and performing a matrix multiplication of the inverted matrix and a vector of the received encoded symbols. In a typical system, a brute force implementation of this can consume excessive computing effort and memory requirements. Of course, for a particular set of received encoded symbols, it might be impossible to recover all of the original input symbols, but even where it is possible, it might be very computationally expensive to compute the result.
Shokrollahi II describes an approach called “inactivation”, wherein decoding occurs in two steps. In the first step, the decoder takes stock of what received encoded symbols it has available, what the matrix might look like and determines, at least approximately, a sequence of decoding steps that would allow for the chain reaction process to complete given the received encoded symbols. In the second step, the decoder runs the chain reaction decoding according to the determined sequence of decoding steps. This can be done in a memory-efficient manner (i.e., a manner that requires less memory storage for the operation than a more memory-inefficient process).
In an inactivation approach, the first decoding step involves manipulating the matrix, or its equivalent, to determine some number of input symbols that can be solved for and when the determination stalls, designating one of the input symbols as an “inactivated symbol” and continue the determination process assuming that the inactivated symbol is indeed solved, then at the end, solving for the inactivated symbols using Gaussian elimination or some other method to invert a matrix that is much smaller than the original decoding matrix. Using that determination, the chain reaction sequence can be performed on the received encoded symbols to arrive at the recovered input symbols, which can either be all of the original input symbols or a suitable set of the original input symbols.
For some applications that impose tight constraints on the decoder, such as where the decoder is in a low-power device with limited memory and computing power, or such as when there are tight constraints on the allowable absolute or relative reception overhead, improved methods might be indicated relative to the inactivation approach described above.
Also, methods for partitioning a file or large block of data into as few source blocks as possible subject to a constraint on the smallest sub-symbol size, and then subject to this split into as few sub-blocks as possible subject to a constraint on the maximum sub-block size, might be useful.