In downloading data from storage or communicating data, for example over a communications network such as the INTERNET, data is transmitted in streams of message packets. Typically, the message packets each contain a word of, for example, 16, 32 or 64 bits. The packets are combined into a message or message segment for transmission.
During transmission of the message, various transmission related factors can result in message packets being lost or the data contained therein being altered so as to be in error, causing the communication to be corrupted at the receiving end. Stored data may also be lost or suffer error due to, for instance, static electrical charges in the environment or other factors. Numerous techniques are in use or have been proposed for replacing lost packets of information and correcting erroneous message data after the communication has been received. Such conventional techniques include Fourier transform-based techniques such as BCH and Reed-Solomon codes.
Conventional techniques for protecting information from loss or errors involve encoding the information before it is transmitted in such a way that, even if it is partially lost or corrupted during transmission, the original information can be recovered. The encoding process necessarily involves the addition of extra, redundant data to the information. This redundant information is gathered together with the original information to form the message that is transmitted. The process of determining the original information, given the received corrupted message, i.e. a message with either losses or errors, is called decoding. Two criteria by which these techniques are evaluated are: how much additional data must be added to achieve reliable communications given a certain expected amount of loss or corruption, and how long it takes to perform the processes of encoding and decoding.
The original information is represented by data items, which are also commonly referred to as message symbols. The data items could for example be message packets or data bits. The redundant data that is also transmitted with the information is represented by redundant data items, which are commonly referred to as check symbols. The redundant data items are of the same type as the data items. For example, if the data items are packets, then the redundant data items are also packets. The collection of data items and redundant data items that is transmitted is called a codeword. In the field of Coding Theory, each corruption of a data item is either called a loss, often referred to as an erasure, or an error. When trying to ascertain the information, the receiver will only have access to a corrupted version of the codeword.
The decoding overhead of a loss-resilient technique at a particular stretch factor is given by that stretch factor divided by the least stretch factor of any code that can reliably recover from the maximum number of losses reliably recoverable by the technique at the particular stretch factor, less 1. Using Reed-Solomon techniques, the decoding overhead is zero. Loss resilient techniques with a decoding overhead of zero are judged by their time overheads, i.e. the time required to encode and decode expressed as a multiple of the total number of data items and redundant data items. To the extent the use of any technique would result in a decoding overhead greater than zero, this added inefficiency must be compensated for by a reduction in the time overhead to provide a total efficiency equivalent to or better than those techniques having a zero decoding overhead.
While superior to other loss-resilient techniques, Fourier transform-based techniques still require a substantial time overhead to perform. Hence, even using a Fourier transform-based technique, there can be a bottleneck at the receiving end due to the time required to replace lost packets. For example, if the number of packets being communicated is 100,000, the time overhead will typically exceed 1,000. The more packets requiring replacement the higher the time overhead.
The situation is similar for error correcting techniques. However, the decoding overhead of an error correcting technique is determined by the entropy function of Information Theory. For example, if the number of redundant data items is to be equal to the number of data items, and those data items and redundant data items are bits, then no technique can reliably recover all data items if more than 11% of the data items and redundant items are corrupted by errors. Proposed prior art error correction techniques are unable to both efficiently and reliably recover from a full 11% error rate, but are generally capable of recovering from errors in approximately 8% of the data items and redundant data items.
While the deficiencies of prior art loss-resilient and error correcting techniques are generally tolerable for transmissions of relatively short lengths, these deficiencies become less acceptable during transmissions of large quantities of data. In applications requiring transmission of messages having large quantities of data items, such as video signal transmission, the probability that all corrupted data items can be recovered is at best low using conventional techniques.
Further, where large blocks of data are being transmitted at high speed, for example, in video distribution, the time required to recover any corrupted data needs to be minimized. The time performance of conventional techniques is generally insufficient to make the necessary recoveries in the required real time periods, unless specialized hardware is provided.
Accordingly, a need exists for data recovery techniques which can be utilized for quickly recovering corrupted data items where large blocks of data items must be transmitted.