This invention is in the field of integrated circuits for data communication, and is more specifically directed to error correction methods in the receipt of such communications.
Recent advances in the electronics field have now made high-speed digital data communications prevalent in many types of applications and uses. Digital communication techniques are now used for communication of audio signals for telephony, with video telephony now becoming available in some locations. Digital communication among computers is also prevalent, particularly with the advent of the Internet; of course, computer-to-computer networking by way of dedicated connections (e.g., local-area networks) and also by way of dial-up connections has also become prevalent in recent years.
Of course, the quality of communications carried out in these ways depends upon the accuracy with which the received signals match the transmitted signals. Some types of communications, such as audio communications, can withstand bit loss to a relatively large degree. However, the communication of digital data, especially of executable programs, requires exact fidelity in order to be at all useful. Accordingly, various techniques for the detection and correction of errors in communicated digital bit streams have been developed. Indeed, error correction techniques have effectively enabled digital communications to be carried out over available communication facilities, such as existing telephone lines, despite the error rates inherent in high-frequency communication over these facilities.
Error correction may also be used in applications other than the communication of data and other signals over networks. For example, the retrieval of stored data by a computer from its own magnetic storage devices also typically utilizes error correction techniques to ensure exact fidelity of the retrieved data; such fidelity is, of course, essential in the reliable operation of the computer system from executable program code stored in its mass storage devices. Digital entertainment equipment, such as compact disc players, digital audio tape recorders and players, and the like also now typically utilize error correction techniques to provide high fidelity output.
An important class of error detection and error correction techniques is referred to as Reed-Solomon coding, and was originally described in Reed and Solomon, "Polynomial Codes over Certain Finite Fields", J. Soc. for Industrial and Applied Mathematics, Vol. 8 (SIAM, 1960), pp. 300-304. Reed-Solomon coding uses finite-field arithmetic, such as Galois field arithmetic, to map blocks of a communication into larger blocks. In effect, each coded block corresponds to an over-specified polynomial based upon the input block. Considering a message as made up of k m-bit elements, a polynomial of degree n-1 may be determined as having N coefficients; with N greater than k (i.e., the polynomial is overspecified), not all of the N coefficients need be valid in order to fully and accurately recover the message. According to Reed-Solomon coding, the number t of errors that may be corrected is determined by the relationship between N and k, according to ##EQU1##
Reed-Solomon encoding is used to generate the encoded message in such a manner that, upon decoding of the received encoded message, the number and location of any errors in the received message may be determined. Conventional Reed-Solomon encoder and decoder functions are generally implemented, in microprocessor-based architectures, as dedicated hardware units that are not in the datapath of the central processing unit (CPU) of the system, as CPU functionality has not heretofore been extended to include these functions.
In this regard, FIG. 1 illustrates one example of an architecture for a conventional Reed-Solomon encoder, for the example where each symbol is eight bits, or one byte, in size (i.e., m=8), where Galois field arithmetic is used such that the size of the Galois field is 2.sup.8, and where the maximum codeword length is 2.sup.8 -1, or 255 symbols. Of course, other architectures may be used to derive the encoded codeword for the same message and checksum parameters, or of course for other symbol sizes, checksum lengths, or maximum codeword lengths. In the example of FIG. 1, sixteen check symbols are generated for each codeword, and as such eight errors per codeword may be corrected. According to conventional Reed-Solomon encoding, the k message bytes in the codeword (M.sub.k-1, M.sub.k-2, . . . ,M.sub.0) are used to generate the check symbols (C.sub.15, C.sub.14, . . . ,C.sub.0). The check symbols C are the coefficients of a polynomial C(x) EQU C(x)=C.sub.15 x.sup.15 +C.sub.14 x.sup.14 +. . .+C.sub.0
which is the remainder of the division of a message polynomial M(x) having the message bytes as coefficients: EQU M(x)=M.sub.k-1 x.sup.k-1 +M.sub.k-2 x.sup.k-2 +. . .+M.sub.0
by a divisor referred to as generator polynomial G(x): EQU G(x)=(x-a.sup.0)(x-a.sup.1)(x-a.sup.2). . . (x-a.sup.15)
where each value is a root of the binary primitive polynomial x.sup.8 +x.sup.4 +x.sup.3 +x.sup.2 +1. The exemplary architecture of FIG. 1 includes sixteen eight-bit shift register latches 6.sub.15 through 6.sub.0, which will contain the remainder values from the polynomial division, and thus will present the checksum coefficients C.sub.15 through C.sub.0, respectively. An eight-bit exclusive-OR function 8.sub.15 through 8.sub.1 is provided between each pair of shift register latches 6 to effect Galois field addition, with XOR function 8.sub.15 located between latches 6.sub.15 and 6.sub.14, and so on. The feedback path produced by exclusive-OR function 2, which receives both the input symbol and the output of the last latch 6.sub.15, presents the quotient for each division step. This quotient is broadcast to sixteen constant Galois field multipliers 4.sub.15 through 4.sub.0, which multiply the quotient by respective ones of the coefficients G.sub.15 through G.sub.0. In operation, the first k symbols contain the message itself, and are output directly as the leading portion of the codeword. Each of these message symbols enters the encoder architecture of FIG. 1 on lines IN, and is applied to the division operation carried out by this encoder. Upon completion of the operations of the architecture of FIG. 1 upon these message bytes, the remainder values retained in shift register latches 6.sub.15 through 6.sub.0 correspond to the checksum symbols C.sub.15 through C.sub.0, and are appended to the encoded codeword after the k message symbols.
The encoded codewords are then communicated in a digital bitstream, and communicated in the desired manner, after the appropriate formatting. For communications over telephone facilities, of course, the codewords may be communicated either digitally or converted to analog signals; digital network or intracomputer communications will, of course, maintain the codewords in their digital format. Regardless of the communications medium, errors may occur in the communicated signals, and will be reflected in the received bitstream as opposite binary states from those in the input bitstream, prior to the encoding process of FIG. 1. These errors are sought to be corrected in the decoding process, as will now be described in a general manner relative to FIG. 2.
An example of the decoding of Reed-Solomon encoded codewords, generated for example by the architecture of FIG. 1, is conventionally carried out in the manner now to be described relative to decoder 10 illustrated in FIG. 2. Decoder 10 receives an input bitstream of codeword symbols, which is considered, for a single codeword, as received polynomial r(x) in FIG. 2. Received polynomial r(x) is applied to syndrome accumulator 12, which generates a syndrome polynomial s(x) of the form: EQU s(x)=s.sub.i-1 x.sup.i-1 +s.sub.i-2 x.sup.i-2 +. . .+s.sub.1 x+s.sub.0
Syndrome polynomial s(x) is indicative of whether errors were introduced into the communicated signals over the communication facility. If s(x)=0, no errors were present, but if s(x) is non-zero, one or more errors are present in the codeword under analysis. Syndrome polynomial s(x), in the form of a sequence of coefficients, is then forwarded to Euclidean array function 15.
Euclidean array function 15 generates two polynomials .LAMBDA.(x) and .OMEGA.(x) based upon the syndrome polynomial s(x) received from syndrome accumulator 12. The degree .nu. of polynomial .LAMBDA.(x) indicates the number of errors in the codeword, and as such is forwarded to Chien search function 16 for additional analysis. Polynomial .OMEGA.(x) is also generated by Euclidean array function 15, and is forwarded to Forney function 18; polynomial .OMEGA.(x) is used by Forney function 18 to evaluate the error in the received bitstream r(x).
Referring now to FIGS. 3a and 3b, the construction of Euclidean array function 15 according to a known approach will now be described. The construction and operation of Euclidean array function 15 as illustrated in FIGS. 3a and 3b is described in further detail in Araki, et al., "Modified Euclidean Algorithm having High Modularity and Minimum Register Organization", Trans. IEICE, Vol. E-74, No. 4 (IEICE, 1994), pp. 731-737. As described therein, Euclidean array function 15 is constructed as cells 20, numbering 2t+5 where t is the number of correctable errors in a codeword; the most significant cell 20.sub.0 is constructed differently from the remaining cells 20.sub.1 to 20.sub.2t+4, which are constructed identically. According to this known technique for realizing Euclidean array function 15, two registers A and B, each with 2t+5 elements in this example, are used to receive the syndrome polynomial s(x) coefficients, and to generate the coefficients in result polynomials .LAMBDA.(x) and .OMEGA.(x).
FIG. 3a illustrates the construction of most significant cell 20.sub.0 in Euclidean array function 15. Cell 20.sub.0 effects either a swap of the A and B position values, followed by a Galois field constant division process, and forwards the result on lines Q to cells 20.sub.j in Euclidean array function 15. The most significant position 30A) in register A receives an input on lines A_IN, and has an output applied to multiplexers 21, 23, while the most significant position 30B.sub.0 in register B has its output also applied to multiplexers 21, 23. The output of multiplexer 21 is applied to Galois field divider 22, while the output of multiplexer 23 is applied both to Galois field divider 22 and to most significant position 30B.sub.0 in register B. Multiplexers 21, 23 are controlled by control line CTR1 to select, in one state, the contents of most significant position 30A.sub.0 as the dividend applied to divider 22, and the contents of most significant position 30B.sub.0 as the divisor applied to divider 22, and in another state to swap the A and B values, so that most significant position 30B.sub.0 is the dividend and most significant position 30A.sub.0 is the divisor. In either case, the divisor operand is restored into most significant position 30B.sub.0 in register B. The quotient output of divider 22 is presented on lines Q for use by the other cells 20.sub.j as will now be described relative to FIG. 3b.
As shown in FIG. 3b, each cell 20.sub.j utilizes the contents of the jth position of each of registers A, B (namely positions 30A.sub.j and 30B.sub.j). In the conventional custom logic realization of Euclidean array function 15, as described in the Araki et al. reference, cells 20.sub.j are connected in sequence, with the A_OUT lines from one cell 20.sub.j serving as the A_IN lines in an adjacent cell 20.sub.j+1. Swapping of the A and B values is effected by multiplexers 25, 27 in cell 20.sub.j, under the control of control line CTR1, as described above. The output of multiplexer 25 is applied to Galois field adder 28A for generating a new A value on lines A_OUT, and is also applied to one input of multiplexer 29. The output of multiplexer 27 is similarly applied to Galois field adder 28B, which generates the contents of jth position 30B.sub.j of register B, and is also applied to another input of multiplexer 29. Multiplexer 29 selects one of the outputs of multiplexers 25, 27, under the control of control line CTR2, for application to Galois field multiplier 24, for multiplication by the value Q from the most significant cell 20.sub.0. The output of multiplier 24 is applied to one input of each of multiplexers 26A, 26B, the other inputs of which receive hard-wired zero levels; multiplexers 26A, 26B have their outputs connected to Galois field adders 28A, 28B, respectively, and are under the control of control line CTR2. The output of Galois field adder 28A will serve as the A.sub.j+1 operand in the next pass through Euclidean array function 15, and as such is shifted into A register location 30A.sub.j+1 on lines A_OUT, while the output of Galois field adder 28B is applied to register B location 30B.sub.j for the next pass.
In operation, this conventional Euclidean array function 15 process stores two polynomials in each of the A and B registers. These polynomials are appended to one another to occupy left and right portions of these registers; as such, the left and right polynomials will be referred to as A.sub.left, A.sub.right, B.sub.left, B.sub.right. The boundaries between the left and right polynomials within the A and B registers move during the Euclidean process, as will now be described relative to FIGS. 4a, 4b and 5.
FIG. 4a illustrates the initial state of A and B registers 30A, 30B prior to operation by Euclidean array function 15. In this conventional approach, polynomial A.sub.left stores the syndrome coefficients generated by syndrome accumulator 12, with polynomial Aright storing all zeroes. Polynomial B.sub.left stores the value x.sup.2t (which, in this example, is a leading one followed by all zeroes), while polynomial B.sub.right stores the value one. Referring now to FIG. 5, the operation of Euclidean array function 15 begins with process 31 comparing the degree of polynomial A.sub.left to that of polynomial B.sub.left ; if polynomial A.sub.left has a degree less than that of polynomial B.sub.left, control signal CTR1 is set to "1" (in process 32.sub.1) to effect a swap in all of cells 20, otherwise control signal CTR1 is set to "0" (in process 32.sub.0). Process 34 then determines the location of the boundaries between the left and right polynomials in A and B registers 30A, 30B.
Each cell 20 of Euclidean array function 15 is associated with corresponding locations of A and B registers 30A, 30B, as noted above. Upon process 34 determining the location of the polynomial boundaries in A and B registers 30A, 30B, those cells 20 which are associated with left polynomial locations of A and B registers 30A, 30B have their control signal CTR2 set to "1" in process 36, while those cells 20 associated with right polynomial locations of A and B registers 30A, 30B have their control signal CTR2 set to "0". Process 38 then executes cell 20.sub.0 of FIG. 2a upon the contents of register locations 30A.sub.0 and 30B.sub.0, generating the Q value for use in each of the other cells 20.sub.j, and also generating new values for register locations 30A.sub.0 and 30B.sub.0. Process 40 then executes each of cells 20.sub.j upon the remaining locations of registers 30A, 30B, generating new values for each location therein. Process 42 then evaluates the degree of polynomial A.sub.left (which corresponds to the syndrome polynomial s(x) as modified by the passes through Euclidean array function 15), and decision 43 compares this degree to the maximum number t of correctable errors. If the degree of polynomial A.sub.left is greater than or equal to t, the operation of Euclidean array function 15 is not yet complete, and the operation repeats from process 31.
However, if the degree of polynomial A.sub.left is less than t, the process of Euclidean array function 15 is essentially complete. At this point, A register 30A contains the coefficients of polynomial .OMEGA.(x) in polynomial A.sub.left, while B register 30B contains the coefficients of polynomial .LAMBDA.(x) in polynomial B.sub.right, as shown in FIG. 4b. Process 44 is then performed to normalize these values, by dividing each coefficient by the value .DELTA. (which is the lowest degree coefficient .LAMBDA.(0) from polynomial B.sub.right. These divided coefficients of polynomials .OMEGA.(x) and .LAMBDA.(x) are then forwarded to Forney function 18 and Chien search function 16, as shown in FIG. 2. Chien search function 16 utilizes polynomial .LAMBDA.(x), generally referred to as the error locator polynomial, to generate the zeroes polynomial X(x) from which Forney function 18 determines the error magnitude polynomial M(x). Chien search function 16 also generates polynomial P(x), which indicates the position of the errors in the received bitstream r(x). The magnitude of the errors as indicated by polynomial M(x) and the position of these errors as indicated by polvnomial P(x), are then used by input ring buffer 19 to generate the corrected bitstream i'(x).
The above description of conventional Euclidean array function 15 illustrates an approach by way of which the Euclidean array algorithm may be implemented in specific hardware, such as in a custom logic device. As is well known in the art, however, the advent of high performance programmable devices, such as digital signal processors (DSPs) and general-purpose microprocessors, favors the use of such programmable devices for many complex and time-consuming operations encountered in modern data processing and communications applications. Programmable devices are especially useful for such operations if parameters of the operation either change over time, or may be dependent upon environment or implementation. As such, it is desirable to effect the Euclidean array function described hereinabove in a programmable device, such as a DSP or microprocessor.
However, it is cumbersome for conventional programmable logic devices to execute finite field arithmetic operations, such as the Galois field multiplication and division performed in Euclidean array function 15 described hereinabove. For the case of a modern DSP, such as the TMS320c6X digital signal processor available from Texas Instruments Incorporated, Galois field addition, multiplication, and division operations require one, twelve, and seventeen clock cycles to perform. Considering that the Euclidean array function 15 utilizes 2t+5 cells 20, one Galois field division, 2(2t+4) Galois field additions, and 2t+4 Galois field multiplications are performed in each iteration, causing the number of clock cycles for execution of Euclidean array function 15 to escalate rapidly. For a typical example in which the number of correctable errors t is ten, and assuming twenty repetitions through the process of FIG. 5 described above, execution of Euclidean array function 15 using this conventional DSP would require 7,060 clock cycles (960 for additions, 5760 for multiplications, and 340 for divisions) for each received codeword.
As noted above, the coefficients of the error locator polynomial .LAMBDA.(x) generated by Euclidean array function 15 are applied to Chien search function 16 in this conventional Reed-Solomon decoder. Chien search function 16 utilizes these coefficients, along with the particular finite field "alphabet", or set of finite field values, to generate two polynomials that are used in further identifying the errors in the received bitstream r(x). One polynomial generated by Chien search function 16 is generally referred to as zeroes polynomial X(x), which is applied to Forney function 18 for determination of the eventual error magnitude polynomial M(x). Chien search function 16 also generates error position polynomial P(x), which is forwarded to input ring buffer 19 as an indication of the position of the errored symbols in the bitstream r(x).
Similarly, and by way of further background, a conventional example of Chien search function 16, as implemented by way of custom logic circuitry, will now be described relative to FIG. 11. In this conventional approach, Chien search function 16 includes root detection block 200. Root detection block 200 evaluates the following function: ##EQU2##
where the term .alpha..sup.i refers to the symbol alphabet for GF(256) Galois field arithmetic, which has 256 members. As noted above, the term .nu. is the degree of the error locator polynomial .LAMBDA.(x) from Euclidean array function 15, and as such corresponds to the number of errors present in the received bitstream r(x). Because .nu. is less than or equal to the number t of correctable errors for successful decoding, the calculation is generally carried out up for index j from 1 to t. Root detection block 200 performs this evaluation, in the example of FIG. 11, by way of multiple weighted sum blocks 202, in combination with a Galois field, finite field, adder 204 and zero detection circuitry 206, as will now be described.
As shown in FIG. 11, the lowest order coefficient .LAMBDA.(0) of error locator polynomial .LAMBDA.(x) is forwarded directly to Galois field adder 204. Each of the next higher order coefficients .LAMBDA.(x), the number of which is the number t of correctable errors, are each forwarded to a corresponding one of weighted sum blocks 202, along with a corresponding power of the Galois field member .alpha..sup.i. Because of the recursive construction of weighted sum blocks 202, as will be described below, the Galois field members ui applied thereto may be maintained as constants. For example, weighted sum block 202, receives the first power Galois field member .alpha..sup.1 along with coefficient .LAMBDA.(1), weighted sum block 202.sub.2 receives the square, or second power, Galois field member .alpha..sup.2 along with coefficient .LAMBDA.(2), and so on. Each of the first t powers of the Galois field symbol alphabet members .alpha..sup.i (i=1 to t) may be prestored in memory, to prevent the repeated calculation of the powers of .alpha..
Each weighted sum block 202 is similarly constructed in conventional Galois field function 16, including a multiplexer 203, a register 205, and a finite field (Galois field) multiplier 207. Each of weighted sum blocks 202 is similarly constructed. In operation, considering that the zeroth order Galois field member .alpha..sup.0 is one, multiplexer 203 in each of weighted sum blocks 202 first selects the coefficient .LAMBDA. for storage in register 205; the output of register 205 is applied to Galois field adder 204, for determination of the first sum and thus possible detection of a root. Galois field adder 204 performs a finite field addition of the contents of each of the registers 205 in weighted sum blocks 202, along with lowest order coefficient .LAMBDA.(0), to evaluate the polynomial X.sub.i for the ith symbol alphabet member .alpha..sup.i. The result of this addition is applied to zero detection circuit 206, which drives an active state on line ZRO in response to the sum equaling zero; this event occurs when the current Galois field symbol alphabet member .alpha..sup.i is a root of the zeroes polynomial X(x). For the second and subsequent members of the Galois field symbol alphabet, the constant values of Galois field symbols .alpha..sup.1 through .alpha..sup.t are applied to Galois field multiplier 207 along with the current contents of register 205. Galois field multiplier 207 again performs the finite field multiplication of these two operands, and multiplexer 203 selects the output of multiplier 207 for storage in register 205, and for presentation to Galois field adder 204 and zero detection circuit 206, to detect whether a root is present at this iteration.
For example, in the second iteration, multiplier 207 performs Galois field multiplication of symbol .alpha..sup.1 and the value .LAMBDA.(1) (the current contents of register 205 after the first iteration), and thus stores the value .LAMBDA.(1).alpha..sup.1 in register 205 and forwards this value to adder 204; weighted sum block 202.sub.2 similarly generates and stores the value .LAMBDA.(2).alpha..sup.2, as do the remaining weighted sum blocks 202, up to block .sup.202.sub.t which generates and stores the value .LAMBDA.(t).alpha..sup.t. In the third iteration, multiplier 207 in weighted sum block 202.sub.1 performs a Galois field multiplication of symbol .alpha..sup.1 and the value .LAMBDA.(1).alpha..sup.1 (the then-current contents of register 205 after the second iteration), and stores and forwards the resulting value .LAMBDA.(1)(.alpha..sup.1).sup.2, or .LAMBDA.(1).alpha..sup.2. Similarly, in this third iteration, weighted sum block 202.sub.2 generates and stores the value .LAMBDA.(2)(.alpha..sup.2).sup.2, or .LAMBDA.(2).alpha..sup.4, as do the remaining weighted sum blocks 202, up to block 202.sub.t which generates and stores the value .LAMBDA.(t)(.alpha..sup.t).sup.2. This process continues for iterations of the index value i (corresponding to the exponent of the .alpha. term in each multiplication) from 1 to 255 in the case of Galois field 256 operations, so that each symbol of the Galois field symbol alphabet is interrogated to determine whether it is a root.
Line ZRO is applied to the enable input of each stage of two registers 218, 220. Register 218 corresponds to the zeroes polynomial X(x), and includes stages 218.sub.0 through 218.sub.t for storing coefficient values of zeroes polynomial X(x) therein. According to this conventional arrangement, index counter 208 maintains a count corresponding to the iteration of the Galois field symbol alphabet members a through root detection circuit 200. This count is applied to Galois field exponential circuit 212, typically constructed as a look-up ROM, which generates a magnitude value on lines MAG in response to the count; this magnitude is applied to the data inputs of register stages 2180 through 218.sub.t. According to the preferred embodiment of the invention, upon detection of a root of zeroes polynomial X(x) as indicated by an active state on line ZRO, the magnitude value on lines MAG is stored in the first available one of register stages 218.sub.0 through 218.sub.t ; once the first one of the stages 218.sub.k of register 218 has a value stored therein, the magnitude value on lines MAG at the time of the next detected root is stored in the next stage 218.sub.k in sequence (218.sub.0, then 218.sub.1, and so on). Upon completion of the Chien search operation, register 218 will then store individual magnitude values for each of the detected roots. These values are forwarded, on lines X(0) through X(t), to Forney function unit 18.
The error position polynomial P(x) is also generated from the count stored in index counter 208. The output of index counter 208 is applied to an inverting input to adder 213, which receives the literal value "255" at a non-inverting input; the output of adder 213 is thus the quantity of 255 minus this count, and is applied to one input of multiplexer 215. The literal "0" value is applied to a second input of multiplexer 215, which is under the control of zero detection circuit 210 which detects when the count provided by index counter 208 reaches zero. Register 220 includes stages 220.sub.0 through 220.sub.t, which store position values in the form of coefficients of position polynomial P(x), each stage 220.sub.k receives line ZRO at an enable input and lines POS from the output of multiplexer 215 at a data input. In operation, upon root detection function 200 detecting a root of zeroes polynomial X(x), indicated by line ZRO being active, the value 255--i is stored in the first available stage 220.sub.k of register 220, such stored value indicating the position of the error in the received bitstream. Additional detected roots result in additional position values being stored in stages 220.sub.k.
The above description of conventional Chien search function 16 illustrates an approach by way of which the Chien search reduction process may be implemented in dedicated, custom logic, hardware. As noted above, the use of programmable devices such as microprocessors and DSPs is generally favored in modern data processing and communications applications, making it is desirable to execute the Chien search operation in a programmable DSP or microprocessor. As in the case of the Euclidean array operation, however, it is cumbersome for conventional programmable logic devices to execute finite field arithmetic operations, such as the Galois field multiplication and addition performed in Chien search function 15 described hereinabove. In the above example of the Chien search operation, where the selected finite field is Galois field 256, 256 iterations are required to interrogate each member of the Galois field symbol alphabet. For the example where the number t of correctable errors is 8, eight Galois field additions (corresponding to the equivalent of eight adders in the realization of Galois field adder 204) and eight Galois field multiplications are required for each symbol alphabet member. In a TMS320c6x DSP, where twelve machine cycles are required for Galois field multiplication and one cycle is required for Galois field addition, a grand total of 26,624 (256 times 8 times 13) clock cycles are necessary to execute the Chien search operation for each received codeword.