Bose-Chaudhuri-Hocquenghem (BCH) code is one of the most widely used error correction code (ECC) techniques in the storage and communication devices. BCH code can detect and correct random errors occurred due to channel noises and defects within memory devices. The encoding procedures of BCH codeword can be implemented by linear feedback shift register (LFSR) and some combination logics together. Comparing with encoding procedures of the BCH codewords, decoding procedures of the BCH codewords are much complicated as shown in FIG. 1. Decoding procedures are as below: After receiving a codeword (S01), in order to decode it, one should compute a syndrome according to specified polynomials (S02). Then, depending on the syndrome, an error-location polynomial can be found (S03). Next, by calculating the roots of the error-location polynomial, error-location numbers can be obtained (S04). Finally, an erroneous codeword can be corrected by above steps (S05).
Conventionally, Peterson-Gorenstein-Zierler (PGZ) algorithm or Berlekamp-Massey (BM) algorithm can be used to find out the aforementioned error-location polynomial. Since the complexity of the PGZ algorithm is higher than the BM algorithm and the BM algorithm can process decoding with faster speed, the BM algorithm is more popular in hardware implementation. However, due to the multiplication inverse used in the BM algorithm, complexity of hardware of the circuit increases significantly. Hence, some scholars had provided a number of improved BM algorithms. The most mature one in use should be inversionless BM algorithms.
Among all inversionless BM algorithms, a commonly used simplified inversionless BM algorithm is disclosed at 2006 by Wei Liu et al on Signal Processing Systems Design and Implementation, 2006. SIPS '06. IEEE Workshop, titled ‘Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash Memories’. Pseudo codes of the algorithm are shown in FIG. 2. By using folding architecture, a concretely implemented circuit is shown by FIG. 3.
From FIG. 2, it is clear that the inversionless BM algorithm initializes all parameters in the first step: calculation values of iterative operation, C2t(0)=1, C2t−1(0)=0, for finding out coefficients of the error-location polynomial, a copied value B2t−1(0)=0, an intermediate calculation value, k(0)=0, and a discrepancy value, d(0)=1. Next, set the rest calculation values of iterative operation Ci (i=0, 1, 2, . . . , 2t−2) to corresponding syndromes, Si, and the rest copied values, Bi, are corresponding to Ci, respectively. Then, process iterative operations for the following calculations and update related values: Ci(r+1)=d(r)·Ci+2(r)−C0(r)·Bi+1(r) (i=0, 1, 2, . . . , 2t), (C2t+2(r)=C2t+1(r)=0, B2t+1(r)=0) (step SiBM.1); judge if C(0)≠0 and k(r)≧0; if yes, calculate Bi(r+1)=Ci+1(r), (0≦i≦2t, i≠2t−2−k, k=0, 1), d(r+1)=C0(r), and k(r+1)=−k(r), otherwise, calculate Bi(r+1)=Bi(r), (0≦i≦2t, i≠2t−2−k, k=0, 1), d(r+1)=d(r) and k(r+1)=k(r)+2; and set Bi(r+1)=0 (i≠2t−2−k, k=0, 1) (step SiBM.2). Numeral r increases from 0 to t−1 in each iterative operation with increment of 1. After all iterative operations are completed, output Ci(t)(i=0, 1, . . . , t).
In FIG. 3, the circuit actually includes 4t+1 registers 1, 1 processing element 2, 1 control element 3, and 1 multiplexer 4. 2t registers 1 enclosed by dashed lines are used to store values of Ci in each iterative operation. 2t registers 1 enclosed by dash-dotted lines are used to store values of Bi in each iterative operation. A symbol in the register indicates a syndrome under initialization. The register 1 in the lower right FIG. 3 is used to provide Galois field values of 0 to the multiplexer 4. The processing element 2 is used to process step SiBM.1 in each iterative operation. It receives current Bi(r) and Ci(r), C0(r), d(r) and a control signal Ctrl0 from the control element 3, and an external control signal Ctrl1. The control element 3 receives a value of Ci from the register 1 in each clock, calculates corresponding C0(r), d(r) and the control signal, Ctrl0, and sends back the results of calculations to the processing element 2 in the next clock as inputs. The multiplexer 4 picks up one form the calculation value of iterative operation, Ci, and the value of 0 defined by the algorithm as an input to the processing element 2.
From the foregoing, such aspect of implemented circuit utilizes 2t folding factors to change the circuit design of parallel calculations and reduce the number of control elements from 2t to 1. However, the processing time for one iterative operation will increase to 2t clocks from 1 clock. In the consideration of area cost, such circuit design has advantages to make the final product of BCH decoder as small as possible. Time for the whole operations of BCH decoding may be saved with the help from the control circuit which has fast operating speed than ever. However, for the new generation of BCH decoders which concerns area cost, how to further improve the algorithm and circuit architecture without lowering operational efficacy and have new designed circuit better area cost is a challenging task.