1. Field of the Invention
The present invention relates generally to the implementation of packet-based cyclic redundancy checks in communications systems, and more particularly to an iterative circuit for performing and time optimizing a cyclic redundancy check calculation in a communications system.
2. Description of the Prior Art
Many packet-based communications protocols use code words appended to the packet transmission to check for the presence of errors introduced in the communications channel. One commonly used scheme for generating such code words is Cyclic Redundancy Check (CRC). The transmitter appends a CRC code word to the end of the packet, while the receiver recalculates the CRC for the entire packet, including the code word. Several CRC schemes are in common use; the various schemes use different polynomials for the calculation, and differ in the resulting code word length.
For a packet transmitted over a serial data stream, the logic circuitry required to calculate the CRC code word in the transmitter or the receiver is well-known and very efficient. A Linear Feedback Shift Register, with exclusive-OR gates as needed to implement the target polynomial, is a sufficient implementation. Each state of the shift register is calculated based on the current serial bit and the previous state of the shift register, So for a serial data stream, n latches (where n is the order of the polynomial) and a few exclusive-OR gates is the extent of the circuitry required.
However, high-speed serial data interfaces (e.g., 10 Gbps, 40 Gbps or above interfaces) often require more expensive technologies (such as SiGe (Silicon Germanium)) to implement data signals at serial baud rates. Such interfaces use high-speed analog circuits to implement the high-speed interfaces, and typically multiplex/demultiplex data to/from the serial interface into slower parallel data paths for processing within CMOS chips. Therefore, the CRC calculation circuit more typically operates on a parallel data bus. If the data bus is “w”-bytes wide, then the CRC calculation must simultaneously process w-bytes to determine the next state of the CRC calculation. Furthermore, since the next state of the CRC calculation is based on the previous state of the calculation, the calculation does not lend itself to pipelining.
A further complexity is introduced when the packet data is not guaranteed to be an integral number of w-bytes, and/or is not guaranteed to be start/stop in aligned locations on the parallel data bus. For example, given a 32-byte wide data bus, a CRC calculation circuit must therefore be capable of handling any of the possible resulting calculation widths: w=1, 2, 3, 4, . . . , 31, 32 bytes. This makes the next state decode for the CRC calculation significantly more complex. The resulting logic circuit may require a significant amount of chip area. Furthermore, since this chip area is primarily consumed by combinatorial logic with large fanout connections, wirability and timing issues may result.
In order to meet system requirements, the CRC calculation logic must typically consist of multiple CRC calculation blocks of various widths, with data steering to select data into each block to be used on any given cycle. One prior art implementation is to implement a w-byte wide data bus, and therefore use “w” CRC calculation blocks of sizes 1 byte, 2 bytes, 3 bytes, etc., up to w bytes, to implement the function. In this configuration, data is fed into all of these blocks in parallel. On any given clock cycle, only one of the CRC calculation block outputs is used. That is, in this parallel approach, one and only one CRC calculation block is selected during each cycle, so the combinatorial propagation delay will always be equivalent to the delay of one CRC calculation block.
It would be highly desirable to provide a structured, iterative approach to the CRC calculation circuitry whereby the CRC calculation may be subdivided into blocks with selectable bus widths which blocks can be cascaded to provide calculation for a bus width of any arbitrary number of bytes.
It would be highly desirable to provide a structured, iterative approach to the CRC calculation circuitry maximizes the circuit area reduction for a given target propagation delay.