Embodiments of the present invention relate to data processing, and more particularly to determining checksums such as cyclic redundancy checks (CRCs).
In data processing systems, data transmitted between a first location and a second location should be received accurately, so that additional processing performed on that data at the second location also can be accurate. Further, to enable detection of errors in data transmission, oftentimes data validation is performed. One example of data validation is through use of a checksum attached to a data packet to be transmitted. For example, a CRC sum can be generated by a transmitting source and appended to data to be transmitted. This checksum, which may be calculated according to one of many different algorithms, can then be compared to a similar checksum generated at the receiving end from the received data. If the two checksums are identical, the receiving system may have high confidence that the transmitted data is uncorrupted. If however the generated checksum varies from the transmitted checksum, an error is indicated. Such checksums are used throughout networking technologies to detect transmission errors. Other uses include database integrity, application-level data integrity checks, and the like.
In different applications, different manners of implementing CRC information exist. For example, CRC calculations can be performed in either hardware or software. To implement a CRC calculation in hardware, typically a dedicated hardware engine is provided within a system to perform the CRC calculation. Accordingly, data to be subjected to such a CRC calculation is sent to the hardware engine for calculation of the CRC, which is then appended to the data, e.g., for transmission from the system. Various drawbacks exist to using such an offload engine, including the overhead of sending data to the engine. Furthermore, it is difficult to perform a stateless hardware offload as typically additional state-based overhead data also needs to be transmitted, increasing complexity and slowing the progress of useful work.
Because many systems lack such an offload engine, CRC calculations are often performed in software. To implement CRC calculations in software, typically lookup table schemes are used. However, such software calculations of CRC values are notoriously slow, compute-intensive operations. Further, the memory footprint of the lookup table can be large, impacting performance. Accordingly, these slow calculations can degrade network performance, and further consume processing resources. As an example, it can take between 5 and 15 processor cycles to perform a CRC calculation per byte of data. As a result, software CRC performance is too low for general use in high-speed networks.