1. Field of the Invention
The following invention relates generally to error detection in a telecommunications environment and specifically to speeding up cyclic redundancy code (CRC) calculations to speed up error detection.
2. Related Art
The use of a cyclic redundancy check, or CRC, is a standard means for error detection in communication networks. A block, or frame, of binary data to be protected is represented as a polynomial over GF(2), the field of integers modulo 2. Computation of the CRC is defined in terms of division of this polynomial by a so-called generator polynomial, with the division carried out using arithmetic in GF(2). The CRC is appended to the data frame before transmission as a frame-check sequence (or FCS), and is checked at the receiving end using an essentially identical polynomial division process.
A formal definition of the CRC employed in data communication applications is given in a number of communication standards. The ISO definition, taken from [1], is paraphrased here:
The K-bit frame-check sequence shall be the ones complement of the sum (modulo 2) of: (a) the remainder of zNU0(z) divided (modulo 2) by the generator polynomial G(z), where the number of bits in the input sequence to be protected by the CRC is N, and U0(z) is an initialization polynomial of degree less than K; and (b) the remainder of zkUp(z) divided (modulo 2) by the generator polynomial G(z), where Up(z) is the polynomial representing the input sequence to be protected by the CRC.
For purposes of this disclosure, the term xe2x80x9cCRCxe2x80x9d will be used to refer to the sum of the two remainders referred to in the definition above. The FCS that is appended to the data frame is equal to the ones complement of what we are calling the CRC. Note that in GF(2) finding the ones complement of a number is equivalent to adding 1 to the number.
If we think of the input sequence as a time series u(n) taking on values 0 or 1 and with time index n starting from 0 (so that u(0) is the first bit to be processed), the polynomial representation referred to in the CRC definition is                                           U            p                    ⁡                      (            z            )                          =                              ∑                          n              =              0                                      N              -              1                                ⁢                                    u              ⁡                              (                n                )                                      ⁢                          z                              N                -                1                -                n                                                                        (        1        )            
The generator polynomial G(z) is a polynomial of degree K. The ISO standard generator polynomials for K=16 and K=32 are
G16(z)=z16+z12+z5+1
G32(z)=z32+z26+z23+z22+z16+z12+z11+z10+z8+z7+z5+z4+z2+z1xe2x80x83xe2x80x83(2)
The initialization polynomial is generally either zero or the polynomial of degree Kxe2x88x921 all of whose coefficients are 1.
The error detection properties of the CRC depend on the characteristics of polynomials over the field GF(2), are well known (see [2], for example), and are not at issue in this disclosure. Rather, we address here efficient means for high- speed CRC computation.
The usual reference implementation for computing the CRC is derived from a circuit for polynomial division that employs a shift register with feedback (see, for example, Sec. 6.2 in Ref [3]). One form of this reference implementation, generalized from Ref. [4], is shown in FIG. 1. The blocks labeled zxe2x88x921 are unit delay elements that make up the shift register; for the block whose output is xk(n), for example, the input is equal to xk(n+1). The scale factors of the gain elements are the coefficients of the divisor polynomial G(z); i.e.                               G          ⁡                      (            z            )                          =                              ∑                          k              =              0                        K                    ⁢                                    g              k                        ⁢                          z              k                                                          (        3        )            
where we assume the coefficients are normalized with gk=1. The input sequence u(n) contains the finite-length block of data to be protected, for n=0, 1, . . . Nxe2x88x921. After the last element of the input sequence has been processed, i.e. at n=N, the shift register contains the remainder of the division required by the CRC definition. More precisely:
Let the shift register be initialized so that it contains a representation of the initialization polynomial U0(z); i.e. if                                           U            o                    ⁡                      (            z            )                          =                              ∑                          k              =              0                                      K              -              1                                ⁢                                    u              ok                        ⁢                          z              k                                                          (        4        )            
then set xk(0)=Uok for k=0, 1, . . . , Kxe2x88x921. Then, at n=N, the contents of the shift register represents the sum of the remainder of zNUo,(z) divided by G(z), and the remainder of zKUp(z) divided by G(z), where Up(z) is the polynomial representation of the input data sequence according to Eqn. (1). In other words, if we call the sum of these two remainders RT(z), with                                           R            T                    ⁡                      (            z            )                          =                              ∑                          k              =              0                                      K              -              1                                ⁢                                    r              Tk                        ⁢                          z              k                                                          (        5        )            
then the coefficients of this polynomial, which make up the CRC, satisfy:
rTk=xk(N); k0, 1, . . . , Kxe2x88x921xe2x80x83xe2x80x83(6)
Note that when the CRC is computed over GF(2) as in the standard definition, the appropriate arithmetic is employed, Thus the summing blocks in FIG. 1 implement modulo 2 addition, and the negative signs in the figure are irrelevant (because any element in GF(2) is its own additive inverse). In addition, since the coefficients of G(z) are all either 0 or 1, the gain elements shown in the figure would be implemented either as a closed connection (for a 1) or an open circuit (for a 0).
The processing of the input sequence in FIG. 1 can be described by the difference equation:
x(n+1)=Ax(n)+bu(n)xe2x80x83xe2x80x83(7)
where the K-dimensional state vector x(n) is
x(n)=[x0(n)x1(n) . . . xkxe2x88x921(n)]Txe2x80x83xe2x80x83(8)
A is a Kxc3x97K matrix with the form                     A        =                  [                                                    0                                            0                                            …                                            0                                            0                                                              -                                      g                    0                                                                                                      1                                            0                                            …                                            0                                            0                                                              -                                      g                    1                                                                                                      0                                            1                                            …                                            0                                            0                                                              -                                      g                    2                                                                                                      …                                            …                                            …                                            …                                            …                                                              …                  ⁢                                      xe2x80x83                                    ⁢                  …                                                                                    0                                            0                                            …                                            1                                            0                                                              -                                      g                                          k                      -                      2                                                                                                                          0                                            0                                            0                                            0                                            1                                                              -                                      g                                          k                      -                      1                                                                                                    ]                                    (        9        )            
and b is the Kxc3x971 matrix
b=[(xe2x88x92g0)(xe2x88x92g1) . . . (xe2x88x92gkxe2x88x921)]Txe2x80x83xe2x80x83(10)
where the superscript xe2x80x98Txe2x80x99 indicates transpose. The initial condition for the difference equation (7) is determined by the initialization polynomial; with U0(z) as in Eqn. (4):
xk(0)=uok; k=0, 1, . . . , Kxe2x88x921xe2x80x83xe2x80x83(11)
Again, when the CRC is computed over GF(2), the calculation in Eqn. (7) is done using modulo-2 arithmetic, and the negative signs in the A matrix in Eqn. (9) and the b matrix in equation (10) are superfluous. Note also that the shift register contains the CRC. In other words, the state vector of the system described by the state equation (7) is equal to the CRC after the last input element has been processed, at n=N.
Observe that Eqn.(7) is executed once for each element of the input sequence u(n) (i.e. for each bit of the input bitstream). A variety of techniques have been developed to compute the CRC more efficiently by processing some number of bits (i.e. elements of the input sequence u(n)) in parallel. This increased efficiency has been found useful for CRC computation in both hardware and software. Parallel CRC computation was originally described by Patel [5]. Perez [6] has an early example of its implementation in software. References [7] through [13] provide other examples of hardware implementations of parallel CRC computation. A companion disclosure document, the U.S. Patent Application entitled xe2x80x9cMethod and Apparatus for High-Speed CRC Computation Based on State-Variable Transformationxe2x80x9d (listed under the references section above, incorporated herein by reference in its entirety, and referenced hereinafter as xe2x80x9ccompanion patent documentxe2x80x9d), describes a maximally efficient (in the sense of throughput relative to circuit speed) hardware implementation.
The basis of all reported techniques for parallel CRC computation can be established by describing formally the block-oriented version of the system state equation (7). Let the elements of the input sequence u(n) be grouped into blocks of length M, so that the input to the block-oriented system is now a vector uM(m) with
uM(m)=[u(mM+Mxe2x88x921)u(mM+Mxe2x88x922) . . . u(mM+1)u(mM)]T; m=0, 1 . . . (N/M)xe2x88x921xe2x80x83xe2x80x83(12)
assuming that N is an integral multiple of M. It is well known that the state equation (7) can be rewritten as:
x(m+1)=AMx(m)+BMuM(m)xe2x80x83xe2x80x83(13)
where the index m is incremented by one for each block of M input elements. The Kxc3x97K matrix AM in Eqn. (13) is equal to the A matrix in Eqn. (9) multiplied by itself M times. The matrix BM is a Kxc3x97M matrix whose columns are found by multiplying the vector b in Eqn. (7) by successively higher powers of A; for Mxe2x89xa6K, the columns of BM are the M rightmost columns of AM. The initial condition for the difference equation (13) is given by Eqn. (11); it is identical to that for the original difference equation. Additionally, the state vector contains the CRC after the last block of M input elements has been processed, assuming that N is an integral multiple of M.
All of the referenced techniques work directly with the block-oriented state equation (13) or else work with some modified or transformed version of this equation. In other words, they all process blocks consisting of M input elements, i.e. groups of M bits from the input data sequence. Thus, there is an implicit (or possibly explicit) assumption in all these techniques that N, the number of bits in the input data sequence, is an integral multiple of M.
At the same time, for most if not all of the referenced techniques, the efficiency increases with M. For example, it clear from the results published by Pei and Zukofsky in [8] that, at least for their hardware-based technique, the increase in throughput for a given circuit clock speed increases approximately linearly with M. Similarly, for the hardware-based technique described in the companion patent document, the throughput is in fact M times that for bit-at-a-time CRC computation at the same circuit clock speed.
In most practical data-communication systems the length N of the data sequences for which CRCs are computed is an integral multiple of 8 bits, so that M=8 is a common value for block CRC computation. Since N is in general not guaranteed to be an integral multiple of any larger number (16, for example), use of a value of M larger than 8 requires some postprocessing to complete the computation of the CRC. This postprocessing is required for all known techniques as well as for the technique described in companion patent document. While several references have presented results for M greater than 8 (see [8], for example), not one has discussed the postprocessing required for these cases.
Indeed, it appears that there is no explicitly described known method for computing the CRC M bits at a time when M is not an integral multiple of N. However, one can develop a set of postprocessing steps as a straightforward extension of almost any of the referenced known techniques for parallel CRC computation. Consider an example with M32 and N an integral multiple of 8. For some known technique, say that in [8], construct an implementation with M=8 in addition to the one with M=32. Process all of the 32-bit blocks of input elements, except the last block, through the implementation with M=32. Clearly, the last block of input elements to be processed will contain either 8, 16, 24, or 32 bits. If the last 32-bit block is complete, process it through the implementation with M=32. Otherwise, process the last block through the implementation with M=8 as either one, two, or three 8-bit blocks. This handling of the last input block represents the desired postprocessing.
That the postprocessing described requires essentially two separate implementations of a parallel CRC computation, e.g. two sets of circuitry, is not necessarily a drawback. It may be that the increased throughput achieved by using M=32 for all but the last block of input elements rather than M=8 for the entire input data sequence justifies the more complex solution. When maximum throughput is the fundamental objective, however, this known postprocessing suffers from a significant disadvantage, namely that it requires some non-zero processing time after the end of an input data frame before processing of the next input data frame can begin. In other words, it requires some interframe idle time on the communication link from which the data is being received or to which the data is being transmitted. To see this, consider a hardware design optimized for maximum throughput using the technique described in the companion patent document. The technique in the companion patent document is maximally efficient because the time it requires to process M bits taken M bits at a time is independent of M. Assuming an optimized design, the time to process 32 bits with M=32 will be approximately equal to 32 bit-times, while the time to process 8 bits with M8 will also be approximately equal to 32 bit-times. In other words, execution of the postprocessing method described above could extend up to 64 bit-times beyond the end of the data frame being processed. Whether or not this represents a problem depends on the characteristics of the communication interface, in particular on the minimum number of bit-times between the ends of two successive data frames. (There is another factor here, namely the minimum number of bit-times between the end of one data frame and the beginning of the next data frame. We make the assumption here that the first bit of any data frame is aligned at one end of the M-bit block, i.e., at the right-hand end, using the notation of Eqn. (12). For additional comments on this assumption and the reason for making it, see the discussions below.) LAN frame formatting is such that there are always more than 128 bit-times between the end of one frame and the end of the next. With HDLC frame formatting, however, there can be as few as 40 bits between the end of one frame and the end of the next. (The numbers here include the bit-times needed for transmission or reception of the CRC field itself.)
What is required is a novel postprocessing technique that is easily pipelined in such a way that it can operate in parallel with CRC computation for the next frame to be processed as well as with the postprocessing of the next frame if it happens to be extremely short. Essentially, it should be able to operate with zero interframe idle time even for minimal length HDLC frames. When combined with the technique for parallel CRC computation disclosed in the companion patent document, this technique would allow a maximally efficient solution for computing the CRC M bits at a time for data frames whose length is not an integral multiple of M bits.
The present invention is directed to a method, and a system for computing cyclic redundancy code (CRC) for use in a communication data stream M bits at a time for an input sequence u(n) whose length is not a multiple of M. The method includes (i) representing a frame of data to be protected as the input sequence; (ii) determining a cyclic redundancy code (CRC) for the input sequence M bits at a time from a state vector, until a last block of the input sequence is reached; (iii) if the last block of the input sequence is full, then determining the CRC to be a completed CRC; and (iv) if the last block of the input sequence is not full, then performing three functions. The method can further include appending the completed CRC as a frame check sequence (FCS) to the communication data stream for detection by a receiving device. The three functions are (a) setting a number of the last bits of the last block equal to zero; (b) processing the last block, padded with the number of last bits equal to zero, according to steps (i) and (ii) to determine a new CRC (yCRC); and (c) running the state vector backwards in time to determine a completed CRC.
The input sequence u(n) can be defined in the field of integers modulo 2 (GF(2)). Step (i) can further include grouping the elements of the input sequence into blocks of length M; and representing the input sequences in a block oriented fashion as um(mmax)=[u(mM+Mxe2x88x921) u(mM+Mxe2x88x922) . . . u(mM+1) u(MM)]T, where m=0, 1, . . . , mmax, where mmax equals (N/M)xe2x88x921.
The state vector can be represented by x(m+1)=AM x(m)+BM uM(m), where m is an integer, where A is a Kxc3x97K matrix containing the coefficients of a CRC generator polynomial, where x(m) is a K dimensional state vector defined as [x0(m) x1(m) . . . xkxe2x88x921(m)]T, where BM is a Kxc3x97M matrix whose columns are determined by multiplying b by successively higher powers of A, and where b is a K dimensional vector containing one or more coefficients of a CRC generator polynomial.
The last block can be represented as uM(mmax), where whether the input sequence is full is found by determining whether N/M is an integer and r2=0, where N is the length of the input sequence, where R is the greatest common divisor of M, and where the last (r2)(R) bits of the last block uM(mmax) are zeros.
Step (iv)(a) can include setting (r2)(R) bits of the last block equal to zero. The new CRC (yCRC) can contain the CRC of the input data sequence augmented with r2R zeros at its end, where R is the greatest common divisor of M, and where r2 is defined by the fact that the last (r2)(R) bits of the last block are zeros.
Finally, step (iv)(c) can include comprises executing xxe2x80x2(rxe2x88x921)=Axe2x88x92R xxe2x80x2(r) for r2 iterations, until xxe2x80x2(rxe2x88x921)=xxe2x80x2(N/R), which equals the completed CRC, where R is the greatest common divisor of M, and where r2 is defined by the fact that the last (r2)(R) bits of the last block are zeros, where r is initially defined as (r2+(N/R)), and where xxe2x80x2(r2+(N/R)) equals the new CRC (yCRC).