The present invention relates to encoding and decoding data in communications systems and more specifically to communication systems that encode and decode data to account for errors and gaps in communicated data.
Transmission of files between a sender and a recipient over a communications channel has been the subject of much literature. Preferably, a recipient desires to receive an exact copy of data transmitted over a channel by a sender with some level of certainty. Where the channel does not have perfect fidelity (which covers most all physically realizable systems), one concern is how to deal with data lost or garbled in transmission. Lost data (erasures) are often easier to deal with than corrupted data (errors) because the recipient cannot always tell when corrupted data is data received in error. Many error-correcting codes have been developed to correct for erasures and/or for errors. Typically, the particular code used is chosen based on some information about the infidelities of the channel through which the data is being transmitted and the nature of the data being transmitted. For example, where the channel is known to have long periods of infidelity, a burst error code might be best suited for that application. Where only short, infrequent errors are expected a simple parity code might be best.
Another consideration in selecting a code is the protocol used for transmission. In the case of the global internetwork of networks known as the “Internet” (with a capital “I”), a packet protocol is used for data transport. That protocol is called the Internet Protocol or “IP” for short. When a file or other block of data is to be transmitted over an IP network, it is partitioned into equal size input symbols and input symbols are placed into consecutive packets. The “size” of an input symbol can be measured in bits, whether or not the input symbol is actually broken into a bit stream, where an input symbol has a size of M bits when the input symbol is selected from an alphabet of 2M symbols. In such a packet-based communication system, a packet oriented coding scheme might be suitable. A transmission is called reliable if it allows the intended recipient to recover an exact copy of the original file even in the face of erasures in the network. On the Internet, packet loss often occurs because sporadic congestion causes the buffering mechanism in a router to reach its capacity, forcing it to drop incoming packets. Protection against erasures during transport has been the subject of much study.
The Transport Control Protocol (“TCP”) is a point-to-point packet control scheme in common use that has an acknowledgment mechanism. TCP is good for one-to-one communications, where the sender and recipient both agree on when the transmission will take place and be received and both agree on which transmitters and receivers will be used. However, TCP is often not suitable for one-to-many or many-to-many communications or where the sender and the recipient independently determine when and where they will transmit or receive data.
Using TCP, a sender transmits ordered packets and the recipient acknowledges receipt of each packet. If a packet is lost, no acknowledgment will be sent to the sender and the sender will resend the packet. Packet loss has a number of causes. With protocols such as TCP/IP, the acknowledgment paradigm allows packets to be lost without total failure, since lost packets can just be retransmitted, either in response to a lack of acknowledgment or in response to an explicit request from the recipient. Either way, an acknowledgment protocol requires a back channel from the recipient to the sender that is used heavily at times when the number of lost packets is large.
Although acknowledgment-based protocols are generally suitable for many applications and are in fact widely used over the current Internet, they are inefficient, and sometimes completely infeasible, for certain applications as such as those described in U.S. Pat. No. 6,307,487 issued to Michael G. Luby entitled “Information Additive Code Generator and Decoder for Communication Systems” (hereinafter “Luby I”). Furthermore, acknowledgment-based protocols do not scale well to broadcasting, where one sender is sending a file simultaneously to multiple users. For example, suppose a sender is broadcasting a file to multiple recipients over a satellite channel. Each recipient may experience a different pattern of packet loss. Protocols that rely on acknowledgment data (either positive or negative) for reliable delivery of the file require a back channel from each recipient to the sender, and this can be prohibitively expensive to provide. Furthermore, this requires a complex and powerful sender to be able to properly handle all of the acknowledgment data sent from the recipients. Another drawback is that if different recipients lose different sets of packets, rebroadcast of packets missed only by a few of the recipients causes reception of useless duplicate packets by other recipients.
An alternative to an acknowledgment-based protocol that is sometimes used in practice is a carousel-based protocol. A carousel protocol partitions an input file into equal length input symbols, places each input symbol into a packet, and then continually cycles through and transmits all the packets. A major drawback with a carousel-based protocol is that if a recipient misses even one packet, then the recipient has to wait another entire cycle before having a chance at receiving the missed packet. Another way to view this is that a carousel-based protocol can cause a large amount of useless duplicate data reception. For example, if a recipient receives packets from the beginning of the carousel, stops reception for a while, and then starts receiving again at the beginning of the carousel, a large number of useless duplicate packets are received.
One solution that has been proposed to solve the above problems is to avoid the use of an acknowledgment-based protocol, and instead use Forward Error-Correction (FEC) codes, such as Reed-Solomon codes or Tornado codes, or chain reaction codes which are information additive codes, to increase reliability. With these codes, output symbols are generated from the content and sent instead of just sending the input symbols that constitute the content. Erasure correcting codes, such as Reed-Solomon or Tornado codes, generate a fixed number of output symbols for a fixed length content. For example, for K input symbols, N output symbols might be generated. These N output symbols may comprise the K original input symbols and N−K redundant symbols. If storage permits, then the server can compute the set of output symbols for each content only once and transmit the output symbols using a carousel protocol.
One problem with some FEC codes is that they require excessive computing power or memory to operate. Another problem is that the number of output symbols must be determined in advance of the coding process. This can lead to inefficiencies if the loss rate of packets is overestimated, and can lead to failure if the loss rate of packets is underestimated.
For traditional FEC codes, the number of possible output symbols that can be generated is of the same order of magnitude as the number of input symbols the content is partitioned into. Typically, most or all of these output symbols are generated in a preprocessing step before the sending step. These output symbols have the property that all the input symbols can be regenerated from any subset of the output symbols equal in length to the original content or slightly longer in length than the original content.
Embodiments described in Luby I (referred to herein as “chain reaction codes”) provide a different form of forward error-correction that addresses the above issues. For chain reaction codes, the pool of possible output symbols that can be generated is typically orders of magnitude larger than the number of the input symbols, and a random output symbol from the pool of possibilities can be generated very quickly. For chain reaction codes, the output symbols can be generated on the fly on an as needed basis concurrent with the sending step. Chain reaction codes have the property that all input symbols of the content can be regenerated from any subset of a set of randomly generated output symbols slightly longer in length than the original content.
In one embodiment of a chain reaction code, each output symbol is obtained as the Exclusive-Or (XOR, denoted by ⊕) of some of the input symbols. If K denotes the total number of input symbols, then each output symbol is, on average, the XOR of c*ln(K) input symbols, where ln(K) is the natural logarithm of K and c is a suitable constant. For example, where K is 60,000, each output symbol is the XOR of, on average, 28.68 input symbols, and where K is 10,000, each output symbol is the XOR of, on average, 22.86 input symbols. A large number of XOR's results in a longer computation time of the output symbols as each such operation involves fetching data from memory, performing the XOR operation, and updating memory locations.
One property of the output symbols produced by a chain reaction encoder is that a receiver is able to recover the original file as soon as enough output symbols have been received. Specifically, to recover the original K input symbols with a high probability, the receiver needs approximately K+A output symbols. The ratio A/K is called the “relative reception overhead.” The relative reception overhead depends on the number K of input symbols, and on the reliability of the decoder. For example, in one specific embodiment, and where K is equal to 60,000, a relative reception overhead of 5% ensures that the decoder successfully decodes the input file with a probability of at least 1-10−8, and where K is equal to 10,000, a relative reception overhead of 15% ensures the same success probability of the decoder. In one embodiment, the relative reception overhead of chain reaction codes can be computed as (13*sqrt(K)+200)/K, where sqrt(K) is the square root of the number of input symbols K. In this embodiment the relative reception overhead of chain reaction codes tends to be larger for small values of K.
In embodiments in which output symbols are encoded using the XOR function, a chain reaction decoder's main computational operation is performing XOR's of memory locations. The number of such XOR's scales in the same way as for the chain reaction encoder.
Chain reaction codes are extremely useful for communication over a packet based network. However, they can be fairly computationally intensive. For example, in some specific embodiments of chain reaction codes, when the number of input symbols K is 60,000, then computation of each output symbols requires fetching, on average, 28.68 randomly selected input symbols and XOR'ing them. Because the number of files that can be simultaneously served by a server is inversely proportional to the number of operations needed for every output symbol, it would be useful to decrease the number of operations needed for every output symbol. Decreasing the latter by a factor of say, three, increases the number of files that can be simultaneously served from one server by a factor of three.
Another property of chain reaction codes is that they require a reception overhead that can be relatively large for a given target success probability. For example, as pointed out above, in some specific embodiments of chain reaction codes, if K is 10,000, then a relative reception overhead of 15% ensures a decoding success probability of at least 1-10−8. The reception overhead tends to increase for smaller K. For example, in some specific embodiments of chain reaction codes, if K is 1000, a relative reception overhead of 61% ensures successful decoding with the same probability. Moreover, decreasing the target error probability to a number around 10−12, as required by certain applications such as high speed transmission of content over a satellite network, calls for an even larger reception overhead.