1. Field of the Invention
The present invention related generally to systems for ensuring integrity of data communications in high bandwidth applications, and more particularly, to a novel apparatus and method for providing data redundancy checks.
2. Discussion of the Prior Art
A typical requirement of any data transfer system, such as a high-speed PCI Express or Infiniband serial bus system, is to provide verification of write data transferred by the system. Thus, typically, write data is encoded in accordance with an error checking algorithm, such as a cyclic redundancy check algorithm (CRC), and the resultant check data appended to the write data. The data is then checked at the other side of the PCI bus system by the same algorithm, including the check data, and, if the data is error free, the remainder of the redundancy calculation is typically an all zero output.
The majority of current communications standards require the computation of a Cyclic Redundancy Check (CRC) for data packets sent. As successive standards increase the bandwidth of data, the bandwidth for CRC computation will likewise increase. Current CRC circuits that provide CRC redundancy calculations do not scale well as the CRC value increases, nor as the amount of data processed per cycle increases. Sizes of current solutions can scale with the square of the amount of data processed per cycle.
Previous solutions addressed increased bandwidth. For example, commonly-owned, co-pending United States Patent Publication No. U.S. 2005/0268209 (hereinafter “the '209 publication”) assigned to International Business Machines Corp., and incorporated by reference as if fully set forth herein, describes a novel cyclic redundancy check generation circuit that comprises an efficient pipelined solution with built in recursion for increasing bandwidth. Thus, a fast pipelined CRC circuit that operates on 256 bits of data per cycle is known in the art, however, a total data length of the data packet must be a multiple of 256 bits. While this may be acceptable in some highly specific situations, many common industry standards have a much smaller data packet granularity, which would prevent the applicability of the previous solutions. For example, both the Infiniband and PCI-Express bidirectional serial data bus configurations that provide very fast serial connection, e.g., at least 2.5 gigabits per second (Gbit/s) or greater in each direction, utilize packets that are multiples of 32 bits in length. Any CRC circuit that operates on these standards will need to function at a high bandwidth and operate on a 32 bit granularity, as well as being restrained size-wise.
FIG. 1A illustrates conceptually a current solution 10 requiring the cascading of 32 bit CRC calculators, for example, each combinatorial CRC32—32 block 15 within combinatorial block 11 representing the circuitry for calculating the CRC signature for each successive 32 bit portion of the data (message) latch 12. It is understood that the byte granularity is configurable depending upon the application, e.g., may be sixteen bytes or eight bytes, etc. For example, the CRC32 block 15a generating the CRC signature for the first 32 bits of the message slice, the next block 15b for the first 64 bits, and so on. Thus, the last block 15n calculates the CRC signature for the 192 bit message. The latch 16 at the output feeds back the data to the first 32 byte calculator 15a, so that the next cycle can begin for the next data portion. Each output 14 represents the CRC remainder computed on a specific multiple of the base granularity date message. For example, output 14a represents the CRC signature for the first 32 bits of the message slice, output 14b for the first 64 bits, and so on. Thus output 14n represents the CRC signature for the 192 bit message slice. However, this solution effectually linearly increases the critical timing path as the size of the data message slice increases, which is too long for today's high frequency operations, e.g., 250 MHz operation, or greater, example.
FIG. 1B further illustrates conceptually each CRC calculator block 15. The portion of the data message from data latch 12 is connected to a 32 bit input, 32 bit output data XOR tree 150. The XOR logic in data XOR tree 150 is understood to be constructed to implement the data-related specific type of CRC calculation desired for CRC calculator block 15. The CRC remainder input to CRC calculator block 15 is connected to a 32 bit input, 32 bit output remainder XOR tree 151. The XOR logic in remainder XOR tree 151 is understood to be constructed to implement the remainder-related specific type of CRC calculation desired for CRC calculator block 15. The outputs of XOR trees 150 and 151 are connected to a 32×2 input XOR function block 152.
FIG. 2 illustrates a CRC calculator solution 18 as described in the exemplary related art described in the '209 publication, which includes a first partition comprising a set of XOR subtrees and latches 215 for processing the data bits and a second partition is a set of XOR subtrees and latches 210 for processing the remainder bits of the CRC. Both partitions are multi-level partitions, each level comprised of multiple XOR subtrees and latches. The outputs of XOR subtrees and latches 210 and 215 are connected to a 32 by 2-input XOR gate 220. The output of XOR gate 220 is connected to a current CRC remainder latch 205. The output of latch 205 is connected to remainder partition XOR subtrees and latches 210. Preferably, each XOR subtrees of the data partition is no slower than the slowest XOR subtree in the remainder partition. Each level of XOR subtrees performs a portion of the CRC calculation and each XOR subtree belonging to a particular level performs a portion of the portion of the CRC calculation performed by the level. The size of the largest remainder subtree is chosen so that all the XOR calculation it performs can be completed in one clock cycle at the desired frequency. Since all the XOR subtrees of the data partition and the remainder partition are no slower than the slowest remainder XOR subtree, each data partition levels portion of the CRC is preferably performed in one clock cycle or less.
With reference to the FIG. 2, the prior art apparatus as described in the '209 publication is still fixed to the m-bit wide data portions and messages are typically not multiples of “m” bits. M on the average could be m=192 bits, e.g. multiples of 32 bits, however, messaging generally implements packets that are not necessarily multiples of M—thus, there may be leftover bits. Consequently, there needs to be a mechanism for calculating the CRC signature for the leftover bits 8 or 16, or like multiple of the base granularity (e.g., 32 bits). That is, a mechanism is needed to obtain the CRC signature of only last message portion (i.e. leftover information).
It would thus be highly desirable to provide a CRC circuit, system and method that is pipelined to run at high frequencies that operates on these standards, i.e., is capable of processing at a high bandwidth and operate on a 32 bit packet granularity, as well as operating on an arbitrary multiple of the base granularity of the data packet.
It would further be highly desirable to provide a CRC circuit, system and method that is pipelined to run at high frequencies system and that additionally operates on an arbitrary multiple of the base granularity of the data packet, and provides the same multiple of outputs that provide intermediary output remainder values.