The following references, which are incorporated herein by reference in their entirety, are referenced in the remainder of this patent document:
[1] ISO 3309, Information processing systems Data communication High-level data link control procedures Frame structure, 1984.
[2] W. W. Peterson and D. T. Brown, Cyclic codes for error detection, Proc. IRE, vol. 49, pp. 228-235, January 1961.
[3] R. E. Blahut, Theory and Practice of Error Control Codes. Reading, Mass. : Addison-Wesley, 1983.
[4] IBM Corporation, Synchronous Data Link Control Concepts, GA27-3093-3, June 1986.
[5] A. M. Patel, A multi-channel CRC register, in AFIPS Conference Proceedings, vol. 38, pp. 11-14, Spring 1971.
[6] A. Perez, Byte-wise CRC calculations, IEEE Micro, vol. 3, pp. 40-50, June 1983.
[7] G. Albertengo and R. Sisto, Parallel CRC generation, IEEE Micro, vol. 10, pp. 63-71, October 1990.
[8] T-B. Pei and C. Zukowski, High-speed parallel CRC circuits in VLSI, IEEE Trans. Commun., vol. 40, pp. 653-657, April 1992.
[9] R. J. Glaise and X. Jacquart, Fast CRC calculation, in Proc. 1993 IEEE Intl. Conf. on Computer Design: VLSI in Computers and Processors, Cambridge, Mass., pp. 602-605, October 1993.
[10] S. L. Ng and B. Dewar, Parallel realization of the ATM cell header CRC, Computer Commun., vol. 19, pp. 257-263, March 1996.
[11] M. Braun et.al., Parallel CRC computation in FPGAs, in Proc. 6th Intl. Workshop on Field Programmable Logic and Applications, Darmstadt, Germany, pp. 156-165, September 1996.
[12] S. Li and J. A. Pasco-Anderson, Efficient CRC remainder coefficient generation and checking device and method, U.S. Pat. No. 5,619,516, Apr. 8, 1997.
[13] R. J. Glaise, A two-step computation of cyclic redundancy code CRC-32 for ATM networks, IBM J. Res. Devel., vol. 41, pp. 705-709, November 1997.
[14] ITU-T Rec. I.432, B-ISDN User-Network Interface Physical Layer Specifications, pp. 16-20, March 1993.
[15] J. J. D Azzo and C. H. Houpis, Linear Control System Analysis and Design. New York: McGraw-Hill, 1981.
[16] K. Hoffman and R. Kunze, Linear Algebra. Englewood Cliffs, N.J.: Prentice Hall, 1971.
1. Field of the Invention
The following invention relates generally to error detection in a telecommunications environment and specifically to speeding up cyclic redundancy code (CRC) calculations to speed up error detection.
2. Related Art
The cyclic redundancy code (CRC) of a block of data, which is also called a frame check sequence (FCS), provides a standard technique for error detection in communication networks. The block of data to be protected is represented as a polynomial whose coefficients are all either 0 or 1. The frame check sequence, or CRC, is essentially the remainder resulting from the division of this polynomial by the CRC""s generator polynomial.
The use of a cyclic redundancy check, or CRC, is a standard means for error detection in communication networks. A block, or frame, of binary data to be protected is represented as a polynomial over GF(2), the field of integers modulo 2. Computation of the CRC is defined in terms of division of this polynomial by a so-called generator polynomial, with the division carried out using arithmetic in GF(2). The CRC is appended to the data frame before transmission as a frame-check sequence, and is checked at the receiving end using an essentially identical polynomial division process.
The reference implementations of CRC calculation are based on a shift register with feedback that processes the input data one bit at a time. With the advent of communication interfaces operating at speeds greater than 100 Mbits/s, it has been found advantageous to design circuitry for parallel CRC computation, i.e., that operates on eight or more bits at a time. Ideally, an implementation that processes M bits at a time can achieve a full M-times speed-up, meaning that its throughput should be M times that of a bit-at-a-time implementation. In fact, all of the reported implementations to date achieve only partial speed-up.
The terms cyclic redundancy check (CRC) and frame check sequence (FCS) can be used synonymously, or can be related to one another. Technically, FCS refers to the CRC (or derivation thereof) when it is appended to the data transmission at the transmitting end to ensure that there has been a reliable transmission at the receiving end.
A formal definition of the CRC employed in data communication applications is given in a number of communication standards. The International Standards Organization (ISO) definition of a CRC is defined as follows: the K-bit frame-check sequence (where K is the bit length of the CRC) must be the ones complement of the modulo 2 sum: (a) the remainder of zNxc2x7U0(z) divided (modulo 2) by the generator polynomial G(z), where the number of bits in the input sequence to be protected by the CRC is N, and U0(z) is an initialization polynomial of degree less than K; and (b) the remainder of zKxc2x7Up(z) divided (modulo 2) by the generator polynomial G(z), where Up(z) is the polynomial representing the input sequence to be protected by the CRC. (Hereinafter, K is used to refer to the bit length of the CRC.)
For purposes of the present invention, the term CRC is be used to refer to the sum of the two remainders referred to in the definition above. The FCS that is appended to the data frame is equal to the ones complement of what is referred to as CRC in the present invention. In GF(2), finding the ones complement of a number is equivalent to adding 1 to the number.
The input sequence is a time series u(n) that takes on values 0 or 1, with time index n, starting from 0 (such that u(0) is the first bit to be processed), and the polynomial representation referred to in the CRC definition is:                                           U            p                    ⁡                      (            z            )                          =                              ∑                          n              =              0                                      N              -              1                                ⁢                                    u              ⁡                              (                n                )                                      ⁢                                          z                                  N                  -                  1                  -                  n                                            .                                                          (        1        )            
The generator polynomial G(z) is a polynomial of degree K. The ISO standard generator polynomials for K=16 and K=32 are (see reference [1], for example):
G16(z)=z16+z12+z5+1
G32(z)=z32+z26+z23+z22+z16+z12+z11+z10+z8+z7z5+z4+z2+z+1xe2x80x83xe2x80x83(2)
The initialization polynomial is generally either zero or the polynomial of degree Kxe2x88x921 all of whose coefficients are 1.
The error detection properties of the CRC depend on the characteristics of polynomials over the field GF(2), are well known (see reference [2], for example), and are not an issue in the present disclosure. Rather, the present invention addresses efficient means for high-speed CRC computation.
The reference implementation for computing the CRC is derived from a circuit for polynomial division that employs a shift register with feedback (see reference [3], for example), although other methods recognized by skilled artisans are also available.
One form of this reference implementation, generalized from reference [4], is shown in FIG. 1. The blocks labeled zxe2x88x921 are unit delay elements that make up the shift register. For the block whose output is XK(n), for example, the input is equal to XK(n+1). The scale factors of the gain elements are the coefficients of the divisor polynomial G(z); i.e.,                               G          ⁡                      (            z            )                          =                              ∑                          k              =              0                        K                    ⁢                                    g              k                        ⁢                          z              k                                                          (        3        )            
where the coefficients are assumed to be normalized with gk=1. The input sequence contains the finite-length block of data to be protected, for n=0, 1, . . . , Nxe2x88x921. After the last element of the input sequence has been processed, i.e., at n=N, the shift register contains the remainder of the division required by the CRC definition. More precisely, letting the shift register be initialized so that its contains a representation of the initialization polynomial Uo(Z); i.e., if                                           U            o                    ⁡                      (            z            )                          =                              ∑                          k              =              0                                      K              -              1                                ⁢                                    u              ok                        ⁢                          z              k                                                          (        4        )            
then set xk(0)=uok for K=0, 1, . . . , Kxe2x88x921. Then, at n=N, the contents of the shift register represents the sum of the remainder of zNUo(z) divided by G(z), and the remainder of zKUp(z) divided by G(z), where Up(z) is the polynomial representation of the input data sequence according to Eqn. (1). In other words, if the sum of these two remainders is called RT(z), with                                           R            T                    ⁡                      (            z            )                          =                              ∑                          k              =              0                                      K              -              1                                ⁢                                    r              Tk                        ⁢                          z              k                                                          (        5        )            
then the coefficients of this polynomial, which make up the CRC, satisfy:
rTk=xk(N); k=0, 1, . . . , Kxe2x88x921xe2x80x83xe2x80x83(6)
When the CRC is computed over GF(2) as in the standard definition, the appropriate arithmetic is employed. Thus the summing blocks in FIG. 1 implement modulo 2 addition, and the negative signs in the figure are irrelevant (because any element in GF(2) is its own additive inverse). In addition, since the coefficients of G(z) are all either 0 or 1, the gain elements shown in the figure would be implemented either as a closed connection (for a 1) or an open circuit (for a 0).
The processing of the input sequence in FIG. 1 can be described by the difference equation:
x(n+1)=Ax(n)+bu(n)xe2x80x83xe2x80x83(7)
where the K-dimensional state vector x(n) is
x(n)=[x0(n)x1(n) . . . . xKxe2x88x921(n)]Txe2x80x83xe2x80x83(8)
A is a Kxc3x97K matrix with the form                     A        =                  [                                                    0                                            0                                            ⋯                                            0                                            0                                                              -                                      g                    0                                                                                                      1                                            0                                            ⋯                                            0                                            0                                                              -                                      g                    1                                                                                                      0                                            1                                            ⋯                                            0                                            0                                                              -                                      g                    2                                                                                                      ⋯                                            ⋯                                            ⋯                                            ⋯                                            ⋯                                            ⋯                                                                    0                                            0                                            ⋯                                            1                                            0                                                              -                                      g                                          k                      -                      2                                                                                                                          0                                            0                                            0                                            0                                            1                                                              -                                      g                                          k                      -                      1                                                                                                    ]                                    (        9        )            
and b is the Kxc3x971 matrix
b=[(xe2x88x92g0)(xe2x88x92g1) . . . (xe2x88x92gKxe2x88x921)]Txe2x80x83xe2x80x83(10)
xe2x80x83where the superscript xe2x80x9cTxe2x80x9d indicates xe2x80x9ctranspose.xe2x80x9d The initial condition for the difference equation (7) is determined by the initialization polynomial; with Uo(z) as in Eqn. (4):
xk(0)=uok; k=0, 1, . . . , Kxe2x88x921xe2x80x83xe2x80x83(11)
Again, when the CRC is computed over GF(2), the calculation in Eqn. (7) is done using modulo-2 arithmetic, and the negative signs in the A matrix in Eqn. (9) are superfluous. Note also that the shift register contains the CRC. In other words, the state vector of the system described by the state equation (7) is equal to the CRC after the last input element has been processed, at n=N.
Finally, we observe that Eqn. (7) is executed once for each element of the input sequence (i.e., for each bit of the input bitstream), Given the structure of the A matrix, the operations that must be performed per input element are two adds and a shift of the register; this is also evident from FIG. 1. For a bitstream to be transmitted to or received from a communication interface, with the CRC computed in GF(2) using modulo-2 arithmetic, the circuitry in FIG. 1 must be able to complete two exclusive-or operations and one register shift within a single bit-time. For an interface operating at 100 Mbits/s, for example, this time is 10 nsec, while the time available is 1 nsec for an interface operating at 1 Gbit/s.
Consider some integrated circuit technology in which the net time through two exclusive-or operations and a register is 2 nsec. Clearly, either an even faster integrated circuit technology or some other approach to CRC computation is needed for interfaces at 1 Gbit/s (for example, the so-called gigabit Ethernet). In fact, there are already communication interfaces operating at speeds in excess of 2 Gbits/s (for example, SONET OC48 at about 2.4 Gbits/s, and 10 Gbits/s Ethernet); for these, a different approach seems mandatory. At the same time, techniques for computing the CRC that can operate at lower clock speeds for a given interface speed can be realized using circuits that sacrifice speed in favor of lower power. In general, then, such techniques are mandatory for some cases and may provide important advantages even when they are not mandatory.
Known techniques for high-speed CRC computation have focused on processing some number of bits (i.e., elements of the input sequence u(n)) in parallel. This approach was originally described by Patel [5]. Perez [6] has an early example of its implementation in software. References [7] through [13] provide other examples of hardware implementations of parallel CRC computation.
The basis of all reported techniques for parallel CRC computation can be established by describing formally the block-oriented version of the system state equation (7). Let the elements of the input sequence be grouped into blocks of length M, so that the input to the block-oriented system is now a vector uM(m) with
uM(m)=[u(mM=Mxe2x88x921)u(mM+Mxe2x88x922) . . . u(mMxe2x88x921)u(mM)]T; m=0, 1. . . , (N/M)xe2x88x921xe2x80x83xe2x80x83(12)
assuming that N is an integral multiple of M. It is well known that the state equation (7) can be rewritten as:
x(m +1)=AMx(m)+BMuM(m)xe2x80x83xe2x80x83(13)
where the index m is incremented by one for each block of M input elements. The Kxc3x97K matrix AM in Eqn. (13) is equal to the A matrix in Eqn. (9) multiplied by itself M times. The matrix BM is a Kxc3x97M matrix whose columns are found by multiplying the vector b in Eqn. (7) by successively higher powers of A; for Mxe2x89xa6K, the columns of BM are the M rightmost columns of AM.
The initial condition for the difference equation (13) is given by Eqn. (11); it is identical to that for the original difference equation. Additionally, the state vector contains the CRC after the last block of M input elements has been processed, assuming that N is an integral multiple of M.
In the case of block-oriented, or parallel, CRC computation, the computation of x(m+1) from x(m) following Eqn. (13) must complete within the time associated with transmission or reception of M input elements, i.e., within M bit-times. If the net computation associated with Eqn. (13) is no more complex than the net computation associated with Eqn. (7), then we would say that the block-oriented system represented by Eqn. (13) provides a speed-up by a factor of M; in other words, for the same speed circuitry, the throughput of the system represented by Eqn. (13) is M times that of the system represented by Eqn. (7).
With the exception of [13], all of the referenced known techniques work directly with the block-oriented state equation (13). They take advantage of one particular characteristic of the structure of the matrices in this equation, namely that for Mxe2x89xa6K, the columns of BM are the M rightmost columns of AM, as noted above, and then employ either a table-lookup approach or an optimized exclusive-or array to carry out the computation indicated in (13). Pei and Zukowski [8], for example, investigate the trade-offs between achievable speed-up and the size and complexity of the exclusive-or array for different CRC polynomials; the maximum speed-up they are able to achieve using their approach is a factor of about M/2. Ng and Dewar [10] find a similar limit for the special case of the 8-bit CRC used to protect ATM cell headers in ATM networks [14].
In [13], Glaise outlines an approach to high-speed CRC computation that is different from the other referenced techniques in several respects but still has one key property in common with them. In Glaise""s approach, the polynomial representing the input data sequence is divided not by G(z), but rather by a polynomial of much higher degree that has G(z) as a factor. The remainder polynomial from this first division is then divided by G(z), and the remainder of the latter division is the desired CRC. Both divisions are carried out using block-oriented processes that can be described using a form of the block state equation (13); the first division is carried out by processing 8 input bits at a time (i.e., with M=8), while the second is carried out in one step with M equal to the degree of the first divisor polynomial. This approach provides speed-up when the first divisor polynomial is selected to have certain properties, as described in [13]; the polynomial used in [13] has degree 123 with 8 non-zero coefficients. It is claimed in [ 13] that the time to process each group of 8 input bits for the first division can be as low as 10 nsec; unfortunately, there is no point of reference for bit-at-a-time processing or per-gate delays, so the speed-up factor achieved cannot be estimated. Even if a full speed-up factor of 8 were achievable here, the technique suffers from a major disadvantage, namely that given G(z), a good divisor polynomial for the first division can be found only by exhaustive search (as stated in [13]); in other words, there is no general method here. Note also that the technique in [13] shares one key property with all the other known referenced material, namely: the remainder of each division, and thus the CRC as well, is identical to the state vector x in the block state equations (13).
Finally, it is important to keep in mind that in most practical data-communication systems the length N of the data sequences for which CRCs are computed is an integral multiple of 8, so that M=8 is a common value for block CRC computation. Since N is in general not guaranteed to be an integral multiple of any larger number (16, for example), use of a value of M larger than 8 requires some postprocessing to complete the computation of the CRC. This postprocessing is required for all known techniques and also for the invention described here. While several references have presented results for M greater than 8 (see [8], for example), not one has discussed the postprocessing required for these cases. The present disclosure also ignores the postprocessing required for these cases, essentially assuming that N is an integral multiple of M. A companion patent document, the U.S. Patent Application entitled xe2x80x9cComputing The CRC M Bits At A Time For Data Whose Length In Bits Is Not A Multiple Of M,xe2x80x9d (listed under the references section above, incorporated herein by reference in its entirety, and referenced hereinafter as xe2x80x9ccompanion patent-documentxe2x80x9d), describes a novel postprocessing technique for cases with N not an integral multiple of M.
To summarize the key properties of all known techniques for fast CRC calculation that are known to the author: (1) All techniques have the CRC as the state vector in a state representation of the division process, using Eqns. (7) or (13); (2) No known technique (with the possible exception of that in [13]) achieves a speed-up factor greater than about M/2, processing blocks of M input bits at a time; (3) No known technique (including that in [13]) includes a method that guarantees a full speed-up factor of M, processing blocks of M input bits at a time, for arbitrary generator polynomials G(z). These problems are solved by the present invention.
The present invention is directed to a method, and a system for computing a cyclic redundancy code (CRC) of a communication data stream taking a number of bits M at a time to achieve a throughput equaling M times that of a bit-at-a-time CRC computation operating at a same circuit clock speed. The method includes (i) representing a flame of the data stream to be protected as a polynomial input sequence; (ii) determining one or more matrices and vectors relating the polynomial input sequence to a state vector; and (iii) applying a a linear transform matrix for the polynomial input sequence to obtain a transformed version of the state vector. The method can further include (iv) applying a linear transform matrix to the transformed version of the state vector to determine a CRC for the polynomial input sequence, if the communication data stream is received by a network device. The method can further include (v) appending the CRC as a frame check sequence (FCS) to the communication data stream for detection by a receiving device.
The polynomial input sequence u(n) can be defined in the field of integers modulo 2 (GF(2)), and the steps (i)-(v) can be performed in GF(2). Step (i) can include grouping the elements of the input sequence into blocks of length M; and representing the input sequences in a block oriented fashion as uM(mmax)=[u(mM+Mxe2x88x921) u(mM+Mxe2x88x922) . . . u(mM+1) u(mM)]T, where m=0, 1, . . . , mmax, where mmax equals (N/M)xe2x88x921.
In step (ii), the state vector can be represented by x(m+1)=AM x(m)+BM uM(m), where A is a Kxc3x97K matrix containing the coefficients of a CRC generator polynomial, where K is the dimension of G(z), where b is a K dimensional vector containing one or more coefficients of a CRC generator polynomial, where BM is a Kxc3x97M matrix whose columns are determined by multiplying b by successively higher powers of A, and where uM(m) is a block oriented version of the input sequence.
Step (ii) can further include (a) determining a matrix A and a vector b from a CRC generator polynomial (G(z)) of degree K; and (b) determining a matrix AM and a matrix BM respectively from the matrix A and the vector b, where A is a Kxc3x97K matrix containing the coefficients of the CRC generator polynomial, where K is the dimension of G(z), where b is a K dimensional vector containing one or more coefficients of the CRC generator polynomial, and where BM is a Kxc3x97M matrix whose columns are determined by multiplying b by successively higher powers of A.
The matrix A, for example, can equal the matrix   A  =      [                            0                          0                          ⋯                          0                          0                                      -                          g              0                                                            1                          0                          ⋯                          0                          0                                      -                          g              1                                                            0                          1                          ⋯                          0                          0                                      -                          g              2                                                            ⋯                          ⋯                          ⋯                          ⋯                          ⋯                          ⋯                                      0                          0                          ⋯                          1                          0                                      -                          g                              k                -                2                                                                          0                          0                          0                          0                          1                                      -                          g                              k                -                1                                                          ]  
where xe2x88x92g0, xe2x88x92g1, xe2x88x92g2, . . . xe2x88x92gkxe2x88x921 are coefficients of the CRC generator polynomial. The vector b can equal [g0, g1, g2, . . . gkxe2x88x921]. The matrix BM can equal [b Ab A2b . . . AMxe2x88x921b].
Step (iii) can include (a) determining a characteristic polynomial H(z) of AM over the field of integers modulo 2 (GF(2)); (b) determining a companion-form matrix AMt whose characteristic polynomial is H(z); and (c) calculating the linear transform matrix T for the input sequence from AM and AMt. The method can include: (a) if M is a power of 2, then determining AM to have the same characteristic polynomial as A, such that H(z) equals G(z); (b) if M is not a power of 2, then calculating H(z) for AM by applying ordinary arithmetic and determining H(z) to have the coefficients of an interim characteristic polynomial evaluated modulo 2. The method can also include (a) calculating the linear transform matrix T as the product of a matrix W2 and a second matrix W1xe2x88x921, where W1=[b1 AMtb1 AMt2b1 . . . AMtKxe2x88x921b1], where W2=[b2 AMb2 A2M b2 . . . A(Kxe2x88x921)M b2], (1)(i) where if H(z) is irreducible, then b1 can equal any vector other than the all-zero vector, (1)(ii) where if H(z) is not irreducible, then b1 is derived by the product of (z+1), where z is a transform variable, and an irreducible polynomial H1(z) of degree Kxe2x88x921, where the matrix products of H1(AMt)b1 and (AMt+IK)b1 are both non-zero in value, where I is an identity Kxc3x97K dimensional matrix, (1)(iii) where if H(z) is irreducible, then b2 can equal any vector other than the all-zero vector, and (1)(iv) where if H(z) is not irreducible, then b2 is derived by the product of (z+1) and an irreducible polynomial H2(z) of degree Kxe2x88x921, where the matrix products of H2(AM)b2 and (AM+IK)b2 are both non-zero in value; and (2) computing W1xe2x88x921 as the inverse of W1 over GF(2).
Finally, the transformed version of the state vector can be defined by the equations xt(m+1)=AMt xt(m)+BMt uM(m), and y(m)=CMt xt(m), where m is an integer; and where step (iv) above includes determining the CRC by applying the equations AMt=Txe2x88x921 AM T, BMt=Txe2x88x921 BM, CMt=T, and the initial condition xt(0)=Txe2x88x921x(0) to the transformed version of the state vector until y(m) equals y(N/M), where N is the length of the input sequence, and where Txe2x88x921 is the transverse of the linear transform matrix T.