The most widely used block codes in communication and storage systems are the Bose Chaudhari Hacquenghem (BCH) and the Reed-Solomon (RS) codes. A comprehensive review of BCH and RS codes and their encoders and decoders can be seen in the books “Error Control Coding Fundamentals and Applications” by Shu Lin and Daniel J Costello, Jr. and “Algebraic Codes for Data Transmission” by Richard Blahut.
Of the many decoding algorithms available to decode BCH and RS codes, the most widely used are the Berlekamp-Massey (BM) algorithm and the Euclidean algorithm (EA). Berlekamp and Massey contributed to the development of the BM algorithm, while Massey reinterpreted the decoding of BCH codes as a shift register synthesis problem. Subsequently, certain drawbacks of the original BM algorithm were addressed by Reed et al in “VLSI design of inverse-free Berlekamp-Massey algorithm”, IEEE Proceedings, Sept 1991. The Euclidean Algorithm (EA) was developed by Sugiyama et al in “A method for solving key equation by decoding Goppa Codes”, Information and Control, Jan 1975.
Before describing any of the algorithms, some properties of Galois Fields and cyclic codes should be defined.
Galois or Finite Fields: A field is a mathematical structure and forms a part of algebraic system. In general, a field is a set of elements in which we can perform addition, subtraction, multiplication and division without leaving the set. Addition and multiplication must satisfy the commutative, associative and distributive laws.
Finite fields are also called Galois fields. A large part of algebraic coding theory is built over finite fields. The number of elements in any finite field is either prime or the power of a prime. Just as the extension of the real number field yields the complex field, a “lower” finite field (containing fewer elements), can be extended to produce “higher” finite fields (containing more elements). Such a finite field is said to be an extension field of the lower field, also sometimes called the base field. Furthermore, it can be proven that the order of any finite field is a power of a prime ‘p’. Thus, a GF(2) is a binary field and all of GF(2m) constitute the extension fields of the prime field GF(2). Extension fields are generated by first defining a primitive polynomial in the field as a generator polynomial, which can be further used for developing all the elements on the field which are all distinct.
Primitive Element: It has been proven that every finite field has a primitive element (which may not be unique). The primitive element can generate all non-zero elements of a finite field by repeated exponentiation. Every element of the finite field can be represented as a unique exponent of the primitive element.
Cyclic codes: A cyclic code has the property that a cyclic shift of one code forms another codeword. The word cyclic implies an LFSR like structure, in which an algebraic relation decides the feedback gains. These mathematical relations, ease the encoding and decoding process, thus attributing greater importance to this class of codes.
For a Reed-Solomon code (n,k,t) over GF (2m), n=2m−1, k is odd, and the code can correct t=(n−k)/2 m-bit symbol errors. Any algorithm for decoding RS codes has to implicitly or explicitly perform these operations:
1) Compute errata locations;
2) Compute the errata values and correct the data.
These functions can be achieved by using a systematic or non-systematic encoding operation. Systematic encoding, as defined earlier, is an operation in which in an encoded codeword the data can be distinguished from the parity symbols in an encoded codeword. In the case of non-systematic encoding, the encoded codeword no longer shows up the data distinctly in the encoded codeword. The following equations indicate systematic and non-systematic encoding operations respectively.
The polynomial g(x) is called the generator polynomial of the code, which is defined as
      G    ⁡          (      x      )        =            ∐              i        =        0                              2          ⁢          t                -        1              ⁢                  (                  z          -                      α            i                          )            .      If the data polynomial is D(x) and the encoded polynomial is C(x), thenC(x)=xn−k.D(x)+r(x)=q(x)·G(x)is called a systematic codeword, where q(x) is the quotient when xn−kd(x) is divided by g(x), whereasC(x)=D(x)·G(x)is called a non-systematic codeword. Most applications use systematic codewords, a restriction which also applies to the present patent application. As can be seen from the above, in both systematic and non-systematic encoding, the codeword polynomial is always divisible by the generator polynomial. Hence, all the roots of the generator polynomial are also roots of the codeword polynomial.
Due to the nature of the encoding operation, the received word, if same as the encoded codeword, yields a zero for all the roots of the generator polynomial. Since the generator polynomial has 2·t=d−1 (d is the minimum distance of the codeword) roots, one can arrive at d−1 values for each root of the generator polynomial. These 2·t values are called the syndromes of the received codeword. With the 2·t syndromes a set of 2·t simultaneous equations can be formed. If the received codeword is R(x), which isR(x)=C(x)+E(x)where E(x)=ej0+e+j1x+ej2x2+. . . +ek(t−1)xt−1 is the error due to the channel noise.
Then the syndromes are obtained assi=R(αi)=C(αi)+E(αi)=E(αi) ∀0≦i≦d−2and the syndrome polynomial can be defined asS(x)=s0+s1x+s2x2+. . . +sd−2xd−2
The decoding problem is that of finding the error locations and error values with the knowledge of the above syndromes. Following the syndrome computation step shown above, the Berlekamp-Massey (BM) algorithm results in the following steps after the calculation of syndromes to decode the received codeword:
1. Determine the Error Location Polynomial σ(x);
2. Determine the Error value evaluator;
3. Evaluate error-location numbers and error values and perform error correction.
Assuming that ‘t’ errors are present in the data received at the input to the decoder, then the syndromes can also be shown as:si=ej0·αi·j0+ej1·αi·j1+. . . +ej(t−1)αi·j(t−1)∀0≦i≦d−2let βi≡αji and δi≡eji thensi=δ1·β1i+δ2·β2i+δ3·β3i+. . . +δv·βvi∀0≦i≦d−2and the Error Locator Polynomial (ELP) can be defined asσ(x)=(1−β1x)·(1−β2x) . . . (1−βvx) =σ0+σ1X1+σ2X2+. . . +σvXv
Where σ0=1
The complete derivation can be found in the previous book reference by Lin and Costello, and the final result is computed from:σr+1=σr−dr·dρ−1·Xr−ρ·σ(ρ)(X)
It can be clearly noticed that every update computation involves computation of inverse of the previous discrepancy. The error value can be found, once the Error Evaluator Polynomial is computed. The Error Evaluator Polynomial (EEP) defined asΩ(x)=Λ(x)·S(x)mod x2tThe above equation is also known as the Key Equation.
The error value can be determined by Forney's error value formula given by:
      e    i    =            Ω      ⁡              (                  X          i                      -            1                          )                            Λ        ′            ⁡              (                  X          i                      -            1                          )            where
Λ′ is the formal derivative of the ELP. This error value can be used to correct the errors in data by reading the same error to cancel the existent error.
The inversion operation involving discrepancy computation slows down the iterative process. The inversion operation also significantly contributes to the critical path delay in VLSI implementations of the BM algorithm. Thus, higher throughputs would be possible if this inversion step is avoided.
As an improvement on the BM algorithm, an inversionless decoding method for binary BCH codes was described in the publication “Inversionless Decoding of Binary BCH Codes” by Reed et al in IEEE Transactions on Information Theory July 1971, to simplify the Berlekamp-Massey algorithm for the special case of binary BCH codes was developed. The VLSI architecture for inversionless decoding of Reed-Solomon codes (non-binary BCH codes) is shown in the prior art FIG. 1, where the error syndrome is input to register T 20. In the above architecture, the only input is the sequence of syndromes which are shifted into register T 20. The value of Λ from the solution of the key equation is loaded into register 30, and an iterative search for all values of k is undertaken until a match is found for the decoded codeword. FIG. 2 shows the flowchart for the prior art Berlekamp-Massey decoder of FIG. 1 using the “inversion-free” Berlekamp-Massey algorithm. A complete description of operation can be found in the Reed et al reference.
Although this algorithm eliminates the need for inversion, it does not include erasure decoding. Troung et al. (1998,1999) have generalized this approach to include erasure handling. In this improvement, the concept of Forney's Syndromes was used, which is based on Erasure Location Polynomials (EraLP). In this method, the EraLP is computed and the modified syndromes are determined. This system takes advantage of the fact that the performance of a channel decoder can be improved by providing “side-information” about the ‘reliability’ of the demodulator estimate of every symbol received by the decoder. One simple way to accomplish this is to flag an “erasure” whenever the demodulator finds the symbol estimate unreliable. This indicates that the guess is purely arbitrary and it is to be disregarded by subsequent stages, as it is unreliable. Decoding with erasures improves performance, because it distributes the task of error correction between the demodulator and the decoder. Since symbols declared as erasures are usually in error, the process of generating erasures will convey the error location information to the decoder. It can be shown that for a code of minimum distance d, the maximum number of erasures that will guarantee correct decoding is d−1, assuming no other errors have occurred. FIG. 3 shows a block diagram that indicates the overall functionality of the RS decoder using inverse-free BM algorithm with erasure correcting capability.
The architectures for the prior art do not have a regular structure, as the mathematical operations involved are different in each of the stages, as can be seen for the various stages of FIG. 3:
Stage 1 (40 of FIG. 3): Iterative polynomial computation and erasure polynomial generation (optional, for erasure handling only)
Stage 2 (40 of FIG. 3): Key Equation Solver (KES)                a) Discrepancy calculation—basically an FIR structure        b) Polynomial update.        
Stage 3 (42 of FIG. 3): Polynomial evaluation
A very regular and systolic architecture for solving the Key Equation of the Berlekamp-Massey algorithm where no erasures are passed to the decoder was proposed by Sarwate et al in “High-Speed Architectures for Reed-Solomon Decoders” in IEEE Transactions on VLSI systems, Oct 2001. Through algorithmic transformations, the authors derived an architecture made up of a series of identical processing elements, which compute the discrepancies and updates simultaneously, contrary to a configuration where in different kinds of processing elements were used earlier. The design of the processing element was such as to significantly lower the critical path delay. The critical path delay was reduced and the number of computational iterations were also reduced, by look-ahead computations of the discrepancies. Sawate et al show that the error evaluation polynomial to be obtained is related to the contents of the upper array after the KES operation. Since the KES operation takes only 2t clock cycles, the extra t cycles required for computing the error evaluation polynomial are avoided. All previous implementations were designed such that the Error Locator Polynomial (ELP) α(x) was computed first and then the Error Evaluator Polynomial (EEP) Ω(x) was computed which was the product of the ELP α(x) and the syndrome polynomial S(x). This additional step represented overhead and extra clocks or more hardware was required.
The Euclidean Algorithm (EA) involves finding the Greatest Common Divisor (GCD) of two polynomials. This algorithm, which is also iterative, finds the discrepancy as the remainder when two polynomials are divided, and uses the same for the update of the ELP. Thus, the ELP is updated until the discrepancy vanishes, or until the decoding limit is reached. The Euclidean algorithm is conceptually elegant and architectures are usually regular, the details of this algorithm are described by in the Lin and Costello reference.
An architecture which incorporates the idea of a single processing unit was suggested by Iwamura et al in “A Design of Reed-Solomon Decoder with Systolic-Array Structure” in IEEE Transactions on Computers, Jan 1995. This architecture improves on earlier implementations by eliminating the need for separate design of different sets of Processing Elements (PE) for each decoding stage. This implementation proposes a simplified design by replication of a single versatile PE. This implementation exploits the fact that all the operations in the decoding process can be decomposed to the form a·b+c·d where a, b, c, d are all elements of GF(2m). The implementation is well suited for VLSI.
There are many disadvantages found in the Prior art Architectures:
The Massey implementation, described in “Shift-Register Synthesis and BCH decoding” in IEEE Transactions on Information Theory, Jan 1969, implements the decoding block Key Equation Solver (KES) as a Linear Feedback Shift Register (LFSR), whose gain is decided by the discrepancies computed in the previous clock cycle. The bottleneck in this implementation is the inversion arithmetic block in every discrepancy computation stage, which limits the speed of operation.
In Reed et al described above, the inversion was eliminated by computing another polynomial update, but the implementation had a MAC (multiply and accumulate) like structure, which had a large Critical Path Delay (CPD), on the order of
  log  ⁡      [                  d        -        1            2        ]  XOR gates. The MAC structure operates on the syndromes to compute the discrepancy, which was used by the Error Locator Polynomial (ELP) update block. This CPD follows, as long as the Key Equation Solver is performed. This implementation does not handle decoding with erasures.
The above architectures are irregular and any change to make the same engine work for different configurations of (n, k), requires major changes in the design. Thus the prior art decoders are neither scalable nor systolic. The implementation in Sarwate et al described above uses the hardware inefficiently, as the same hardware can be reconfigured for multiple functionalities.
Since in all these implementations, the critical path delay (CPD) is due to the multipliers followed by the adder tree, Sarwate et al derived a systolic architecture for an errors-only RS decoder. The architecture has a reduced critical path delay, at the cost of extra computational complexity. Computation complexity increases because one extra update needs to be computed in every decoding step. The authors also show that performing iterations on the product of S(x)Λ(x) yields a polynomial that is related to the EEP. Further, it is shown that this polynomial could also be used to compute the error magnitudes. It is also shown that
            Ω      ⁡              (        x        )                            Λ        ′            ⁡              (        x        )              =                    x                  d          -          1                    ⁢                        Ω          ′                ⁡                  (          x          )                                    Λ        ′            ⁡              (        x        )            where Ω(x) is the EEP as per the BM algorithm, and Ω′(x) is the polynomial obtained by Sarwate et al.
Zhang et al. (2002) improved upon the architecture of Sarwate et al by providing erasure-handling capability. Using one of the architectures derived by Sarwate et al, the present inventors have extended the idea to erasures-and-errors decoding. Moreover, much of the prior art is directed to the Key Equation Solver (KES) step, and derives optimized implementations for this step. It is desired that the different hardware blocks are used for other decoding steps, and if necessary, that all these units should operate as a pipeline.
The present invention describes a new processing unit, copies of which are connected to form a reconfigurable finite field arithmetic processor that can be used to perform multiple decoding steps.
There are several disadvantages of the prior art. In terms of regular reusable structure, the prior art architectures using the BM algorithm for errors-and-erasures decoding, generally do not contain reusable structures, which requires several unrelated structures for each stage of the decoder. In terms of hardware efficiency, the prior art BM implementations have concentrated on optimizing the KES step in the decoding process. In terms of implementation, it is always possible to reduce the hardware complexity by time-sharing of a limited number of processing units. Thus, the hardware efficiency is achieved only at a given decoding step—the optimization is seldom done across decoding steps. For example, the Chien root search and Error evaluation decoding steps, which are often the most time-consuming steps in the decoding process, do not have a straightforward mapping onto the KES hardware. The prior art of Sarwate et al has underutilized hardware, as the KES block has 6t GF multipliers for just 2t clocks. The GF multiplier is a gate intensive element having at least 130 gates.
The prior art inversionless architectures, including all features such as errors and erasure decoding, are either non-systolic or semi-systolic (all prior architectures, and Zhang et al. (2002)). Additionally, the prior art decoders do not handle Shortened and punctured codewords, and the prior art decoders have large critical path delay, although reduced in recent implementations and further improved in Sarwate and Zhang et al.
With regard to Patent Prior Art, the following references are noted which describe the individual processing elements of Reed-Solomon decoders:
Finite field multipliers are described in U.S. Pat. Nos. 4,216,531 by Chiu, 5,272,661 by Raghavan et al, and in 6,230,179 by Kworkin et al. U.S. Pat. Nos. 5,818,855 by Foxcraft and 6,473,799 by Wolf describes Galois Field multipliers.
There are several polynomial evaluation architectures in the prior art, including U.S. Pat. Nos. 5,751,732 and 5,971,607, both by Im.
U.S. Pat. No. 5,787,100 by Im describes a system for calculating the error evaluator polynomial in a Reed-Solomon decoder. U.S. Pat. Nos. 5,805,616 by Oh and 5,878,058 by Im describe systems for calculating an ELP and EEP, including support for punctured and shortened codes.
Reed-Solomon decoder systems which incorporate the previously described elements can be found in U.S. Pat. Nos. 6,587,692 by Zaragoza, 6,487,691 by Katayama et al, 6,553,537 by Jukuoka, 6,694,476 by Sridharan et al, U.S. application Ser. Nos. 2002/0023246 by Jin, 2003/0229841 by Kravtchenko, 2003/0135810 by Hsu et al, and 2003/0126542 by Cox.