1. Technical Field
The present invention relates generally to the field of error correction in digital communication systems and, in particular, to a parallel system and method for cyclic redundancy checking (CRC) generation. The bulk of the data (in a digital data stream) is processed using the system/method of the present invention to generate a partial CRC. This partial CRC, along with the last few data bits of the digital data stream, may then be used by a conventional CRC generating algorithm to compute the CRC of the entire digital data stream.
2. Background Description
Digital communications between computers form a vital part of the Internet (as well as other networks). Unfortunately, in many situations, the transmitted data is corrupted by the time it is received by a receiver. Thus, the detection of transmission errors is essential in such communications.
Cyclic Redundancy Checking (CRC) is a commonly used error-detection scheme. It is particularly useful in areas involving digital data storage and transmission.
CRC is a type of checksum transmitted with data. The CRC is computed as the remainder when a xe2x80x9cdataxe2x80x9d number is divided by a xe2x80x9cstandardxe2x80x9d number. The CRC is then appended to the data. At the receiver, the CRC of the whole sequence, the data followed by its CRC, is computed again. For an errorless transmission, the computed CRC at the receiver should be zero. Non-zero CRC at the receiver implies a transmission error with a high probability.
Since the data size is usually quite large, performing an integer division is not an easy task. An alternate way of looking at this division is using polynomials. The digits of the numbers form the coefficients of various powers in the polynomials. The degree of such a polynomial is the number of digits in the corresponding code standard encode number minus one.
The xe2x80x9cstandardxe2x80x9d number is known as a generating polynomial. Several generating polynomials have been standardized. Some of these standardized generating polynomials are described by R. N. Williams, in xe2x80x9cA Painless Guide to CRC Error Detection Algorithmsxe2x80x9d, ftp://ftp.rocksoft.com/clients/rocksoft/papers/crc_v3.txt, Version 3, Aug. 19, 1993. It is to be noted that polynomial division can be carried out in binary, modulo-2 arithmetic. Further, polynomial division can also be broken down into a series of XOR operations.
A review of current CRC generation software algorithms is provided in the above referenced article by R. N. Williams. Nonetheless, a brief review of current CRC generation algorithms will now be given.
The basic CRC generation algorithm is the bit-wise algorithm. FIG. 1 is a block diagram illustrating the bit-wise algorithm 100 according to the prior art. The throughput of this algorithm is one input data bit per cycle. The input data bits are appended by M zero bits, wherein M is the number of bits in the CRC. The bit-wise algorithm is described in further detail by T. B. Pei and C. Zukowski, in xe2x80x9cHigh-speed Parallel CRC Circuits in VLSIxe2x80x9d, IEEE Trans. Communications, Vol. 40, No. 4, pp. 653-57, April 1992.
Software implementation of the bit-wise algorithm becomes efficient if a number of bits are grouped together, usually as an 8-bit byte as described by R. N. Williams in the above-referenced article. These algorithms are referred as table-lookup algorithms, and are discussed in the following articles: G. Griffiths and G. C. Stones, xe2x80x9cThe Tea-leaf Reader Algorithm: An Efficient Implementation of CRC-16 and CRC-32xe2x80x9d, Communications of the ACM, Vol. 30, No. 7, pp. 617-20, July 1987; A. Perez, xe2x80x9cByte-wise CRC Calculationsxe2x80x9d, IEEE Micro, Vol. 3, No. 3, pp. 40-50, June 1983; T. V. Ramaabadran and S. V. Gaitonde, xe2x80x9cA Tutorial on CRC Computationsxe2x80x9d, IEEE Micro, Vol. 8, No. 4, pp. 62-75, August 1988; D. V. Sarwate, xe2x80x9cComputation of Cyclic Redundancy Checks via Table Look-upxe2x80x9d, Communications of the ACM, Vol. 31, No. 8, pp. 1008-13, August 1988; and R. N. Williams, xe2x80x9cA Painless Guide to CRC Error Detection Algorithmsxe2x80x9d, ftp://ftp.rocksoft.com/clients/rocksoft/papers/crc_v3.txt. Version 3, Aug. 19, 1993.
Another approach in software implementation uses a shift-and-add method described by D. C. Feldmeier, in xe2x80x9cFast Software Implementation of Error Correcting Codesxe2x80x9d, IEEE/ACM Transactions on Networking, Vol. 3, No. 6, pp. 640-51 December 1995.
U.S. Ser. No. 09/021/516, entitled xe2x80x9cEfficient CRC Generation Utilizing Parallel Table Lookup Operationsxe2x80x9d, filed on Feb. 10, 1998, assigned to the assignee herein, the disclosure of which is incorporated herein by reference, describes a software algorithm for CRC generation using parallel table lookup operations, which can be done very efficiently using SIMD-style vector units.
It is to be noted that hardware implementations are popular for computing the CRC. Using such implementations, the basic bit-wise algorithm can be accelerated. For example, the bit-wise algorithm was treated as an M-tap finite-impulse-response filter, by G. Albertengo and R. Sisto, in xe2x80x9cParallel CRC Generationxe2x80x9d, IEEE Micro, Vol, 10, No. 5, pp. 63-71, October 1990. Then, the operation was converted into M parallel operations via the Z-transform method from digital signal processing. The Z-transform method is described by A. V. Oppenheim and R. W. Schafer, in Discrete-Time Signal Processing, Prentice Hall, Englewood Cliffs, N.J., USA, 1989.
A number of shift-and-subtract operations were merged in processing M bits by T. B. Pei and C. Zukowski, in xe2x80x9cHigh-speed Parallel CRC Circuits in VLSIxe2x80x9d, IEEE Trans. Communications, Vol. 40, No. 4, pp. 653-57, April 1992. Different hardware implementations of their algorithm are described by: R. F. Hobson and K. L. Cheung, in xe2x80x9cHigh-performance CMOS 32-bit Parallel CRC Enginexe2x80x9d, IEEE Journal of Solid-State Circuits, Vol. 34, No. 2, pp. 233-35, February 1999; and A. Maniatopoulous, T. Antonakopoulos, and V. Makios, in xe2x80x9cSingle-bit Error-correction Circuit for ATM Interfacesxe2x80x9d, Electronic Letters, Vol. 31, No. 8, pp. 617-18, Apr. 13, 1995.
The table-lookup algorithm was also implemented in hardware as described by: R. J. Glaise and X. Jacquart, in xe2x80x9cFast CRC Calculationxe2x80x9d, Proc. 1993 IEEE International Conference on Computer Design: VLSI in Computers and Processors, Cambridge, Mass., USA, pp. 602-05, Oct. 3-6 1993; and S. M. Sait and W. Hasan, in xe2x80x9cHardware Design and VLSI Implementation of a Byte-wise CRC Generator Chipxe2x80x9d, IEEE Transactions on Consumer Electronics, Vol. 41, No. 1, pp. 195-200, February 1995.
Fast algorithms for CRC computation that are treated as finite state machines are described by: B. Castagnolo and M. Rizzi, in xe2x80x9cHigh-speed Error Correction Circuit Based on Iterative Cellsxe2x80x9d, International Journal of Electronics, Vol. 74, No. 4, pp. 529-40, April 1993; M. C. Nielson, in xe2x80x9cMethod for High-speed CRC Computationxe2x80x9d, IBM Technical Disclosure Bulletin, Vol. 27, No. 6, pp. 3572-76, November 1984; and A. Sobski and A. Albicki, in xe2x80x9cPartitioned and Parallel Cyclic Redundancy Checkingxe2x80x9d, Proc. 36th Midwest Symposium on Circuits and Systems, Detroit, Mich., USA, Vol. 1, pp. 538-41, Aug. 16-18, 1993. A scheme using asynchronous CMOS hardware is described by S. H. Li and C. A. Zukowski, xe2x80x9cSelf-timed Cyclic Redundancy Check (CRC) in VLSIxe2x80x9d, Proc. 40th Midwest Symposium on Circuits and Systems, Sacramento, Calif. USA, Vol. 2, pp. 1021-23, Aug. 3-6, 1997. The speed-ups obtained in all these algorithms were less than or equal to a factor of M (i.e., the number of bits in the CRC).
Accordingly, it would be desirable and highly advantageous to have a CRC generation system and method that can obtain speed-up factors beyond M.
The present invention is directed to a parallel system and method for CRC generation. By processing K bits of data (the number of bits that can be delivered per cycle) in one iteration, the present invention allows for speed-up factors to be obtained well beyond the CRC size. In optimized embodiments of the present invention, the speed-up is increased by a factor of K.
According to one aspect of the invention, there is provided a method for generating a partial Cyclic Redundancy Checking (CRC) value of a first interval of data in a digital data stream. The method includes the step of loading a precomputed CRC value corresponding to the one bit followed by a predetermined number of zeros. The predetermined number of zeros correspond to the number of digits of a polynomial minus one (i.e., the degree of the polynomial). The first interval of data is partitioned into a plurality of bits. The precomputed CRC value corresponding to the one bit followed by the predetermined number of zeros is enabled, for each of the plurality of bits having a value of one. The enabled, precomputed CRC values are combined to generate the partial CRC value of the first interval of data.
According to a second aspect of the invention, the method further includes the step of executing the same number of copies of the method as intervals of data in the digital data stream to generate partial CRC values of the intervals of data in the digital data stream.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.