A. Field of the Invention
The present invention relates to a method and device for calculating Cyclical Redundancy Checksums using a programmable computer.
B. Description of the Related Art
Digital data transmission systems are used in a variety of different applications ranging from transferring financial numbers representing dollar amounts in bank accounts, to storing the music of our favorite performers on compact digital audio discs, to communicating telemetry data with aircraft and orbiting space satellites. To transmit this type of information, digital transmission systems deliver a sequence of binary information to a receiver across a transmission channel. Due to impairments in the transmission channel (i.e., the inability of the transmission channel to accurately deliver the transmitted bits), the binary information may become corrupted or change as it traverses the transmission channel. If undetected, the amounts in our bank accounts would be wrong, our favorite singers would be out of tune, and aircraft could be lost.
To prevent these problems, error detection schemes are employed to detect differences between the originally transmitted bits and the received data bits. To implement an error detection scheme, a bit stream is divided into a series of frames, each frame being a known grouping of bits. The frames are either of fixed length (e.g., 256 bits), or are variable length (delineated by known patterns); either way frame boundaries are recoverable by the receiver. The transmitter then appends a cyclic redundancy checksum ("CRC") to each transmitted frame. CRCs are often used because they are easy to implement and can detect a large class of errors. The mathematics underlying CRCs is known to those skilled in the art of error control coding and described in "Error Control Coding: An Introduction," by Peter Sweeney, Prentice Hall 1991, and "Theory and Practice of Error Control Codes," by Richard E. Blahut, Addison-Wesley Publishing Company, Inc., 1983, which are hereby incorporated by reference.
The transmitter determines the CRC by interpreting the bits in a frame to be the coefficients of a binary field polynomial. For example, if there are K bits in a frame then the bits in the message c.sub.K-1, c.sub.K-2, c.sub.K-3, . . . c.sub.2, c.sub.1, c.sub.0, where C.sub.K-1 is first in the frame (transmitted first in time) and c.sub.0 last in the sequence (transmitted last), each have the value 1 or 0. This frame can thus be represented as a (K-1).sup.th order polynomial:
C(X)=c.sub.K-1 X.sup.K-1 +c.sub.K-2 X.sup.K-2 +. . . +c.sub.2 X.sup.2 +c.sub.1 X+c.sub.0 PA1 T(X)=C(X)*X.sup.R +((C(X)*X.sup.R) modulo G(X)), PA1 CRC=CRCMOD1(CRC) XOR FRAME(i), where CRCMOD1 is the look-up table implementing the remainder re-circulation.
where X is a bit delay operator and the c.sub.i 's are the coefficients of a binary field polynomial.
The frame is then augmented by appending R zero bits to form an augmented frame with N=K+R bits. Appending R zeros is mathematically equivalent to multiplying the polynomial by X.sup.R. The augmented polynomial is now C(X)*X.sup.R, an (N-1)th order polynomial.
The CRC of the frame is calculated by dividing C(X)*X.sup.R by a binary field polynomial of order R, G(X) known as the generator polynomial. The remainder of the polynomial division is another polynomial of order R-1, represented by R bits. Appending the bits to the original non-augmented frame is mathematically equivalent to adding the remainder to the augmented polynomial, forming a transmitted polynomial
The calculated CRC can be used to detect if errors occurred in the received data. The receiver receives the N bit frame, treats the bits as the coefficients of an (N-1).sup.th order polynomial and divides this polynomial by the generator polynomial. The remainder of this division will be zero if no errors occurred during transmission.
Both the transmitter and receiver must perform polynomial division. Any apparatus or method to accelerate this division will either allow for faster data transmission or lower transmitter and receiver complexity.
Shown in FIG. 1 is a prior art circuit for dividing a frame of bits by a generator polynomial to generate a CRC. The circuitry uses shift registers with feedback to implement the division of a frame of bits by an example generator polynomial G(X)=X.sup.8 +X.sup.4 +X.sup.3 +X.sup.2 +1. The remainder registers R.sub.7 to R.sub.0 represent delays of 1 bit. The Exclusive-OR logic gates 10 before registers R.sub.0, R.sub.2, R.sub.3, R.sub.4 correspond to the non-zero coefficients of the G(X) divisor polynomial; the XOR operation is equivalent to subtracting the generator polynomial from the current remainder. The remainder registers R.sub.7 to R.sub.0 are typically initialized to the first 8 bits of the message, that is C.sub.N-1 to C.sub.N-8, at the start of the polynomial division. Alternatively, the remainder can be initialized to zero and the circuitry of FIG. 1 can be clocked an additional 8 times to shift the first 8 bits of the message, that is C.sub.N-1 to C.sub.N-8. Then, the frame bits are shifted at each iteration into the circuitry in the order C.sub.N-9, C.sub.N-10, C.sub.N-11, C.sub.N-12, . . . C.sub.1, C.sub.0. At the end of iterations, registers R.sub.7 through R.sub.0 contain the final remainder, which is shifted out as the CRC (at the transmitter) or used to determine if errors occurred (at the receiver).
The bit streams comprising digital messages are commonly grouped into symbols of m-bits each. This is because usually data that are transmitted are already grouped into symbols. Typically the symbols are 8-bit, 16-bit or 32-bit long. This natural grouping motivates a method of calculating bit field polynomial division efficiently for symbol-oriented message streams. Calculating CRCs using a computer processor is particularly well suited to calculating a symbol at a time because the Arithmetic Logic Unit and data buses are capable of symbol-wide logical operations. The division operation implemented bit-wise by the circuit of FIG. 1 can be performed symbol-wise by a similar circuit shown in FIG. 1a. In this circuit the component "remainder re-circulator" is a combinatorial circuit performing the equivalent of clocking the circuit of FIG. 1 m (here m=8) times while forcing the message bits c's to zero; that is, it recirculates the remainder m (here m=8) times through the XOR/shift register circuit. The output of the symbol-wise recirculator is then XORed with the next m bits of the incoming message; this symbol-wise XOR operation is equivalent to shifting the next symbol of the message in one bit at a time for a total of m (here m=8) shifts. By inspection, the reader can determine that these two steps are equivalent to clocking the circuit of FIG. 1 m times. It is readily determined by inspection that the input c.sub.i s have no bearing on the output of R7. The remainder recirculation is dependent only on the initial value of the remainder, and the later symbol-wise XOR of the recirculated remainder with the c.sub.i bits performs the equivalent of being shifted through the register m times. In a computer processor, the remainder recirculator function can be implemented in a lookup table and the symbol-wise XOR function can be implemented in the processor's ALU. The division can thus be calculated with an iterative loop containing the following steps. First, the current remainder indexes the lookup table, and then the table's output is XORed with the next incoming symbol of the frame:
Shown in FIG. 1b is a diagram of a generalized circuit for binary polynomial division. In this circuit, the CRC width is R bits, and a symbol is m bits wide. This circuit operates in three steps for each symbol. First, the m upper bits of the remainder are used as the address of a R-bit wide lookup table implementing the recirculation function. Second, the remainder is shifted left by m bits, and XORed with the table result. Thirdly, the next symbol of the message is XORed with the result of the previous step to form the new remainder. The above three steps are equivalent to calculating the R-bit remainder of (R(X)*X.sup.m +C(X)) modulo G(X) where R(X) is the value of the symbol contained in the R-bit remainder register and G(X) is the divisor.
One drawback of this method is that many computer processors require multi-cycle latencies to access a look-up table stored in memory. The result of a memory access is thus not available until several clock cycles after the memory access is initiated. Several independent memory accesses can be in the pipeline, but the result of an access can not be used, for example, as the operand of another memory access, until the first access is completed. The processor may thus wait idle until the memory operation completes. One example of such a processor is the Texas Instruments TMS320C6x family of digital signal processors, which has a 5-cycle memory access latency, although memory cycles can be initiated on every clock cycle.
Such memory latency limits the execution time of the loop calculating the remainder. The loop becomes "stalled" waiting for the look-up section of the iteration, and the processor waits idle. It would be desirable to find a method and apparatus to more efficiently calculate the remainders of binary polynomial division.