The present invention relates to a general purpose processor that includes an execution unit adapted for performing a processor instruction, whereby the execution unit comprises an integrated circuit.
A fast calculation of checksums that are secure is important in the presence of cloud computing and RAS (Remote Access Service). A checksum received together with a message allows for verifying that the message was not changed during transmission. The secure hash standard (SHS) provides a set of cryptographically secure hash algorithms (SHA) specified by the National Institute of Standards and Technology. Defined in Federal Information Processing Standards (FIPS) Publication 180-3, SHA-2 is such a standard that is as of today known to be secure and therefore frequently used for calculating checksums. Usually, encryption and/or decryption are performed together with checksum calculation. Thus, efficient checksum calculation methods prevent the encryption and/or decryption from being a bottleneck in simultaneous calculations.
Several approaches for hardware accelerated SHA are known from prior art. All of them have in common that the whole algorithm is implemented in a special hardware that makes such implementations difficult and costly (see, for example, “The Design of a High Speed ASIC Unit for the Hash Function SHA-256” (384, 512)”, Dada et al., DATE'04).
Since the SHA algorithm works on 8 internal 32-bit states or on 8 internal 64-bit states, it is quite difficult, if not impossible, to build fully pipelined hardware that performs the whole algorithm. But most in-core execution units are fully pipelined in order to have a higher throughput. If hashing is performed off-core, data chunks must be large enough to compensate for transfer cycles required to send the data to an off-core accelerator and to send a result back. Additionally, the hardware cannot be used by multiple threads simultaneously if it is not pipelined.