Bose-Chaudhuri-Hocquenghem (BCH) code is one of the most widely used error correction code (ECC) techniques in the storage and communication devices. BCH code can detect and correct random errors occurred due to channel noises and defects within memory devices. To construct a BCH codeword, one should define a code length n, an error correction ability t and a primitive polynomial over extension field GF(2m). The encoding procedures of BCH codeword can be easily implemented by linear feedback shift register (LFSR) and some combination logics together. Comparing with encoding procedures of the BCH codewords, decoding procedures of the BCH codewords are much complicated as shown in FIG. 1.
After receiving a codeword (S01), in order to decode it, one should compute a syndrome according to specified polynomials (S02). Then, depending on the syndrome, an error-location polynomial can be found (S03). Next, by calculating the roots of the error-location polynomial, error-location numbers can be obtained (S04). Finally, an erroneous codeword can be corrected by above steps (S05).
Conventionally, one may adopt Peterson-Gorenstein-Zierler algorithm (PGZ) or Berlekamp-Massey (BM) algorithm to find out the error-location polynomial. Since the computational complexity of PGZ algorithm is higher than BM algorithm and BM algorithm can achieve higher decoding speed, BM algorithm is much popular for hardware implementation.
According to the error-location polynomial λ(x)=λ0+λ1x+ . . . +λtxt, the roots of λ(x) can be found simply by substituting 1, α, α2, . . . , αn-1 (n=2m−1) into λ(x). Since αn=1, α−1=αn-1. Therefore, if α1 is an error-location number, αn-1 is another error-location number. Conventionally, this substitution procedure can be operated iteratively by Chien's search, and implemented in a circuit design as shown in FIG. 2.
Please refer to FIG. 2. A conventional Chien's search module 10 is disclosed. The Chien's search module 10 includes a number of calculating units. Each calculating unit 101, 102 . . . or 10t (t is any integer greater than 2) includes a multiplexer, a multiplier and a register (i.e., the calculating unit 101 includes a multiplexer 111, a multiplier 121 and a register 131, the calculating unit 102 includes a multiplexer 112, a multiplier 122 and a register 132, and the calculating unit 10t includes a multiplexer 1t, a multiplier 12t and a register 13t). Take the calculating unit 101 for example. In operation, the multiplexer 111 receives the coefficient λ1 of the error-location polynomial λ(x), the multiplier 121 multiplies the coefficient λ1 with a and the product is sent to an adder 170 and then stored in the register 131. Other calculating units run in the same way. The difference is the calculating unit 10k (k is any positive integer small than or equal to t) multiplies λk with αk and outputs the product to the adder 170 and stores it in corresponding register.
The adder 170 sums all products from the calculating units 101, 102 . . . and 10t with coefficient λ0. Thus, λ(α)=λ0+ζ1α+ . . . +λtαt can be obtained. If λ(α) equals zero, a is one root of λ(x). α indicates a location where an incorrect bit exists. The bit can be corrected. Otherwise, the location indicates doesn't have incorrect bit. Then, an iterative calculation begins. The calculating unit 101 is still taken for example. Product of λ1α stored in the register 131 is inputted to the multiplexer 111 through the multiplier 121. This time, new product, λ1α2, is generated. Similarly, λ2α4 . . . and λtα2t are generated from calculating unit 102 . . . and 10t, respectively. Thus, λ(α2)=λ0+λ1α2+ . . . +λtα2t can be obtained by the adder 170. If λ(α2) equals zero, α2 is one root of λ(x). α2 indicates another location where an incorrect bit exists. The iterative calculation stops after the n cycle is finished.
It is obvious from above that calculation load is significant since the whole processes takes n (2m−1) times of iteration. However, improvement of hardware can conquer this time consumptive problem and shorten latency of Chien's search. On the other hand, it is necessary that latency of Chien's search should be further shortened because data transition becomes massive and speed is fast than ever. Among the procedures of decoding of BCH codewords, Chien's search takes the most of time (around 40% of total time consumed). How to shorten latency of Chien's search is the key point to enhance efficiency of decoding BCH codewords.