The present invention relates to encoding and decoding data in communications systems and more specifically to communication systems that encode and decode data using a chain reaction coder wherein the associates used to generate an output symbol are selected from a window comprising less than all of the input symbols.
As described in Luby I and Luby II, chain reaction coding is useful in many communications systems. With chain reaction coding, output symbols are generated from a set of input symbols according to associations of input symbols with the output symbols.
In one embodiment, for example, an encoder generates an output symbol from a key, I, of the output symbol, where the number of possible keys, and therefore output symbols, is much larger than the number of input symbols. The encoder determines a list, AL(I), of W(I) input symbols to be associated with the output symbol and calculates a value, B(I), for the output symbol. Luby I and Luby II describe various methods and apparatus for calculating W(I) from I, calculating AL(I) from I and W(I), and generating B(I) from one or more of AL(I), W(I) and I. The decoder receives output symbols, and when sufficient output symbols are received, the decoder can calculate values for input symbols from the values of the output symbols. This process is referred to herein as xe2x80x9cchain reaction codingxe2x80x9d because once a decoder decodes an input symbol, that result can be used in combination with information about other received output symbols to decode more input symbols, which in turn may lead to an ability to decode even more input symbols.
Where the encoder and decoder are optimized, the decoder can entirely recover an input file of K input symbols from a received set of K+a output symbols most of the time. In one decoding process, the decoder waits for receipt of K+a output symbols. In many communications systems, the channel is not perfect, so some output symbols may have been lost. In general, the decoder does not assume that is has received a specific or contiguous K+a output symbols, but instead operates on the assumption that the particular K+a output symbols are arbitrarily distributed among the possible output symbols. In a decoding process, the decoder uses the K+a output symbols to attempt to decode as many of the input symbols as possible. With the proper selection of a, most of the time, the decoder will decode all of the input symbols before using all of the K+a output symbols. In the ideal case, the decoder just finishes recovering all of the input symbols when it runs out of output symbols, so that no output symbols are wasted and no more output symbols are needed. Of course, since the decoder cannot control which of the possible output symbols are going to be received at the decoder, some transmissions of K+a output symbols will include more output symbols than needed and some transmissions of K+a output symbols might not be enough to decode all of the input symbols, in which case the decoder would take additional output symbols from the channel to complete the decoding process.
In a most general case, the xe2x80x9cassociatesxe2x80x9d of an output symbol, i.e., the input symbols that are in the list AL(I) for a given output symbol in the above example, are selected from the entire set of input symbols. Thus, an efficient encoder would likely store all the input symbols in local memory for quick access as needed to generate output symbols. As used herein, memory is considered local or remote, with local memory being differentiated from remote memory not necessarily in its location, but in its response time and ease of access. For example, a computer system might have a 512 kilobyte (KB) processor cache, a 128 megabyte (MB) RAM memory (random access memory), and a 2 gigabyte (GB) disk drive. Assuming that the time the processor needs to access the processor cache is much less than the time it needs to access the RAM and the RAM access time is much less than the time it needs to access the disk drive, the processor cache would be local memory relative to the remote memory of the RAM and the RAM would be local memory relative to the remote memory of the disk drive.
In optimizing a processing system for performance, one considers the use of local memory and remote memory. If a process is changed from one that uses only remote memory or remote and local memory to one that uses only local memory or a higher proportion of local memory over remote memory, the process will end up being more efficient. One constraint that prevents many processes from using local and remote memory more efficiently is the memory needs of the process and the sizes of the memories. For example, where 1 GB is needed to store data, in the above-described processing system, at least some of the data will have to be stored on the disk drive.
In the case of a chain reaction coding system, if the input file requires 1 GB for storage, the above-described processing system would either have to quickly process data from the disk drive, or be modified to include 1 GB of RAM. In many cases, enlarging a more local memory to avoid the use of a more remote memory is not an economical proposition.
In one embodiment of a chain reaction coding system according to the present invention, a window is used in the selection of associates, resulting in localization of activity when generating output symbols and recovering input symbols, thereby allowing for more frequent use of a limited local memory instead of more frequent use of a remote memory.
One advantage of the present invention is that it allows for a more efficient encoder and decoder that can take advantage of the locality of the input symbols. One benefit of locality is that a subset of the input file can be stored in a fast memory, such as a processor cache, and the fast memory can be smaller than the entire file. Another benefit of locality is that a change to a local area of the input file will only affect a local area of output symbols. Locality may be relative, in that the encoding and decoding might be taking place on a subset that comprises a set of input symbols that are not necessarily contiguous in the input file as stored.
A further understanding of the nature and the advantages of the inventions disclosed herein may be realized by reference to the remaining portions of the specification and the attached drawings.