Embodiments of the present invention relate to data processing, and more particularly to determining checksums such as cyclic redundancy checks (CRCs).
In data processing systems, it is desirable that data transmitted between a first location and a second location is received accurately, so that additional processing performed on that data at the second location also can be accurate. Further, to enable detection of errors in data transmission, oftentimes a data packet will be transmitted with a checksum attached. For example, a CRC sum can be generated by a transmitting source and appended to data to be transmitted. This checksum, which may be calculated according to one of many different algorithms, can then be compared to a similar checksum generated at the receiving end from the received data. If the two checksums are identical, the transmitted data is correct. If however the generated checksum varies from the transmitted checksum, an error is indicated. Such checksums are used throughout networking technologies to detect transmission errors.
In different applications, different manners of implementing CRC information exists. For example, CRC calculations can be performed in either hardware or software. To implement a CRC calculation in hardware, typically a dedicated hardware engine is provided within a system to perform the CRC calculation. Accordingly, data to be subjected to such a CRC calculation is sent to the hardware engine for calculation of the CRC, which is then appended to the data, e.g., for transmission from the system. Various drawbacks exist to using such an offload engine, including the overhead of sending data to the engine. Furthermore, it is difficult to perform a stateless hardware offload. That is, typically additional state-based overhead data also needs to be transmitted, increasing complexity and slowing the progress of useful work.
Because many systems lack such an offload engine, CRC calculations are often performed in software. To implement CRC calculations in software, typically lookup table schemes are used. However, such software calculations of CRC values are notoriously slow, compute-intensive operations. Further, the memory footprint of the lookup table can be large, impacting performance. Accordingly, these slow calculations can degrade network performance, and further consume processing resources. As an example, it can take between 5 and 15 cycles to perform a CRC calculation per byte of data. As a result, software CRC performance is too low for general use in high-speed networks.