Error-corrective coding is often needed to preserve the integrity of received data. The basic concept is to add redundancy to a message to eliminate the need for retransmission. Shannon's work showed that dramatic gains in performance could be achieved by intelligently adding redundancy to a transmitted message. This addition of redundancy to the message is called Forward Error-Correction (FEC).
There are many methods of adding redundancy, but one of the most efficient and popular methods is RS coding. FIG. 1 is a block diagram of a typical RS system.
The RS encoder takes a block of digital data and adds “redundant” bits (i.e., bits containing some of the same information already present). During transmission or storage of data, data errors may occur for a number of reasons, such as noise, interference, or scratches on a storage medium such as a CD. The RS decoder recovers the original data by processing each received block of data to correct any errors in the received block. The number and type of errors that the decoder can correct depends on the characteristics of the RS code.
Over the last 20 years, RS encoded data has become commonplace in all sorts of communications and computer channels, particularly those that have a potential for bursts of errors, which is where multiple errors tend to be close to each other. RS codes are typically used where the multiple bits in error (an error burst) fall within the frame of a single character and it can be assumed that most of the time only one character is erroneous within a data block.
RS codes are also concatenated with convolutional codes, which are typically used in cases where the transmitted data is a bit stream having an undefined length.
Many practical applications take advantage of the multiple-error-correcting capability of an RS code, including magnetic and optical storage systems, as well as all sorts of communications systems.
Overview of RS Codes
Although the mathematics underlying RS coding is complex, some basic terminology for describing RS codes is provided here. An RS code is described as an (n, k) code, where the code words each consist of n symbols, k of which are message symbols. The following parameters characterize RS codes:                b=the number of bits per symbol        n=the total number of symbols per code word        k=the number of symbols in the unencoded word        (n−k)=the number of check symbols per code word        t=(n−k)/2, the maximum number of erroneous symbols that can be corrected        
FIG. 2 shows a typical RS code word broken into data and parity parts, which are further broken into symbols, which are further broken into bits. The RS code is a systematic code because the data is left unchanged and the parity symbols are appended to the data.
Designing an RS code is difficult, and knowledge of RS codes is highly specialized, and thus is typically outside the mainstream designers' domain of expertise. Specifically, RS codes operate over an algebraic finite field called a Galois Field (GF), and the encoding and decoding algorithms are complicated. Due to the complexity of the RS coding theory, this document does not discuss in detail all aspects of the RS algorithms, but focuses instead on the implementation of these algorithms. There are a large number of available coding algorithms and techniques for carrying out the finite field arithmetic.
Implemented RS Algorithms
Although there are a large number of RS algorithms to choose from, most of the RS algorithms used today are based on the Galois Field GF(256). However, algorithms based on GF(128), GF(512) and other Galois fields are used. Likewise, there are a multitude of generator polynomials, and primitive elements to choose from, though only a few are commonly used. See A Commonsense Approach to the Theory of Error Correcting Codes, pp. 193, 194 in appendix A, by Benjamin Arazi for a formal treatment and definitions and explanations of generator polynomials and primitive elements.
Briefly, every GF(q) has a primitive element α such that every field element except 0 is expressed by some power of α (modulo q). Also, αq−1=1. That is, operations in the exponent of α are performed modulo q−1. Further, any polynomial g(x) of degree n, modulo on which the operations of the field GF(2″) are performed, is called a generating polynomial of the field.
If, for example, the RS code has symbol size b=8 (one byte or 8 bits per symbol), and the symbols are elements of the Galois Field GF(28)=GF(256), then the maximum code word length will be (28=1)=255 symbols. Further, an RS code with R=2t parity bytes has a generator function g(x)=(x−r0)(x−r1) . . . (x−rR−1), where r is a root of the binary primitive polynomial x8+x4+x3+x2+1.
A popular RS code designation is RS(n=255,k=223) with b=8 bit symbols. Each code word contains 255 code word symbols, which are each one byte long. So 223 symbols are data and 32 symbols are parity. That is, referring to FIG. 2 for this code n=255, k=223, b=8, 2t=32 and t=16. Therefore, this RS code word allows an RS decoder to correct any sixteen erroneous symbols in the code word; that is to correct errors in up to 16 bytes anywhere in the code word.
The amount of processing required to encode and decode RS codes is related to the number 2t of parity symbols per code word. A large t allows errors in a larger number of symbols to be corrected but requires more encoding/decoding processing power. Conversely, a small t allows errors in a fewer number of symbols n to be corrected but requires less encoding/decoding processing power.
RS Encoding Algorithm
Each RS code word consists of k message bytes (Mk−1,Mk−2, . . . ,M0) and R=2t parity bytes (CR−1, CR−2, . . . ,C0). The check polynomial C(x)=CR−1xR−1+CR−2xR−2+. . .+C0 is obtained as the remainder when the message polynomial M(x)=(Mk−1xk−1+Mk−2xk−2+ . . . +M0)*XR is divided by the generator function, g(x).
Referring to FIG. 3, the RS Encoder function takes an array of k data symbols as an input and returns an array of n symbols (an RS code word). For brevity, further details of RS encoding are omitted. See A Commonsense Approach to the Theory of Error Correcting Codes, pp. 145, 146 by Benjamin Arazi for more information.
RS Decoding Algorithm
There is more than one algorithm that can be used to decode RS codes. Most common are Euclid's algorithm and the Berlekamp-Massey algorithm. The Berlekamp-Massey algorithm is used by an embodiment of the invention (described below) because it is iterative and less computationally intensive since it doesn't involve polynomial multiplication and division, but only sums.
Referring to FIG. 4, the RS Decoder function takes as its input 1 a received code word of n symbols and returns as an output 7 k decoded data symbols together with a count of the number of symbol errors corrected or a flag indicating that more than t errors were present. An RS decoding algorithm is typically divided into the following steps:                i) Calculation of the power sum symmetric functions 2;        ii) Generating the error locator and error evaluator (magnitude) polynomial 3;        iii) Using a search algorithm to search the polynomial roots 4;        iv) Calculating the error values 5 and correcting the errors 6.        
Historically, these steps have been performed in hardware in a serial manner where there is no need to have the time required to perform any given step be related to the time required for any other step.
There are several different ways to accomplish these steps. One is to generate a pipeline in which a separate piece of hardware is dedicated to each of the four steps and operates on one block of data at a time. Thus, the pipeline processes four different data blocks at one time and synchronizes the processing. This is hardware intensive, but it does allow for data to be processed at the rate at which it comes in as long as the time it takes to perform any step is less than the time it takes to input the next block of data. Another is to perform all the steps serially in software for a block of data and when the last step is completed read the next block. This has limited usefulness since the rate of arrival of the next block of data is a function of the transmitter and receiver, not the RS decoding processor. Therefore this process is typically limited to a relatively slow transmission rate.
The Peak-Load Issue
To implement a software RS decoder on a general processor, a block processing technique is typically used to reduce real-time sensitivity of the system.
When the block size is identical with the RS code word size, there is no peak-load issue. In every block, the following tasks need to be performed:                1. Read in a complete RS code word (you can read in a complete RS code word in a block because in this case the block size is exactly equal to code word length);        2. Perform all the steps i)–iv) of the RS decoding algorithm as discussed above one by one.        
However, the block size of most systems is limited by system parameters, and therefore, the block size is usually smaller than the RS code length. This is what causes a peak-load issue to arise.
Assume the block size is half of the RS code length, meaning the decoding procedure described needs to be repeated every 2 blocks. It also means that each code word is two blocks in length.
In the first block, since only half of the RS code is available, the decoding procedure cannot be started, and nothing can be done after the read of the first half of the RS code word. Only after reading in the second block is a complete code word available, and then RS decoding can be performed. Since all the calculation is performed after reading in the second block, the processor load associated with the second block is very high, or peaked.
Referring to FIG. 5, where the load is plotted over time, the code word 51 takes two block cycles 50 to be completely read in to the system. No processing can be done during the first two cycles 47 and 48 while the first code word 51 is being read in. In FIG. 5, first letter abbreviations are used for functions. Input is ‘I’, syndrome is ‘S’, locator polynomial is ‘L’, root search is ‘R’, and error magnitude evaluation is ‘M’. During input cycle I2,1 49, which is the first block cycle of the second code word, the syndrome calculation S1 42 is performed on the first code word. During the next block cycle 41, the second half of the second code word I2,2 is read in but no other work is or can be performed. Then, while the first half of the third code word is being read in I3,1, the syndrome calculation S2 46 of the second code word and the error locator polynomial L1 43, of the first code word are calculated. This is followed by block cycle 40 during which no calculation is performed while the system is reading in the second half of the third code word I3,2. A definite peak of load occurs in block cycle 44 when the first half of the fourth code word, I4,1, is read in and at the same time three calculations occur, the root search calculation R1 of the first code word, the error locator polynomial calculation L2 of the second code word, and the syndrome calculation S3 of the third code word. This peak cycle is followed by another cycle when no calculation is performed while the system is reading in the second half of the fourth code word, I4,2. During the following block cycle I5,1, the error magnitude evaluation calculation M1 45 of the first code word occurs during a peak along with all the calculations of block cycle 44 as previously noted. After this there is an alternating no-load cycle followed by a peak-load cycle as long as code words are read in. Therefore, approximately half of the available processing power is unused in this example.
If each code word is longer than two blocks, then the peak problem becomes worse because there are more blocks when nothing can be done but to read more of the code word in.
This means that most of the computation will be performed at those blocks containing the last symbol of an RS code word. The peak-load occurring in these blocks will cause a severely skewed computation load distribution over the whole decoding process, and therefore an obvious performance loss in many systems such as multi-line and multi-tasking systems.
To some systems where the block size is limited by the delay requirement of the receiver, the peak-load problem may become completely intolerable and may cause system collapse.