SHA stands for Secure Hash Algorithm. It consists of five hash functions designed by the National Security Agency (NSA) and published by the National Institute of Standards and Technology (NIST). One of them is SHA-2. SHA-2 is a set of secure hash functions including SHA 224, SHA 256, SHA 384, and SHA 512 developed by the NSA intended to provide a higher level of security than the SHA-1 algorithm. SHA 224 and SHA 256 are similar algorithms based on a 32 bit word length producing digests of 224 and 256 bits. SHA 384 and SHA 512 are based on 64 bit words and produce digests of 384 and 512 bits.
The SHA-2 algorithm is computationally more complex the SHA 1, relying on carry propagate additions as well as logical operations and rotates. A critical path for a round of SHA-2 operations consists of four consecutive propagate additions with adder inputs being determined by complex logical and rotation functions. FIG. 1 depicts details of the SHA-2 algorithm. A, B, C, D, E, F, G, and H represent the 8 words of state (32 bits for SHA 224/256 and 64 bits for SHA384/512). Following operations are performed for each iteration:Ch(E,F,G)=(EF)⊕(EG)Ma(A,B,C)=(AB)⊕(AC)⊕(BC)Σ0(A)=(A>>>2)⊕(A>>>13)⊕(A>>>22)Σ1(E)=(E>>>6)⊕(E>>>11)⊕(E>>>25)
The bitwise rotation uses different constants for SHA-512. In this example, the given numbers are for SHA-256. Constant K plus Wi message input addition can be performed ahead of the round critical path. The message scheduling function for the SHA-2 algorithm is also more complex than SHA-1 relying on rotated copies of previous message inputs to form message inputs:                for i from 16 to 63s0:=(w[i−15]ROTR7)XOR(w[i−15]ROTR18)XOR(w[i−15]SHR3)s1:=(w[i−2]ROTR17)XOR(w[i−2]ROTR19)XOR(w[i−2]SHR10)w[i]:=w[i−16]+s0+w[i−7]+s1where ROTR (also used as “>>>”) denotes a bitwise right-rotate operator; SHR denotes a bitwise right-shift operator; and XOR denotes a bitwise exclusive-OR operator.        
For SHA-256, each iteration is performed as follows:Σ0:=(a ROTR2)XOR(a ROTR13)XOR(a ROTR22)maj:=(a AND b)XOR(a AND c)XOR(b AND c)t2:=Σ0+maj Σi1=(e ROTR6)XOR(e ROTR11)XOR(e ROTR25)ch:=(e AND f)XOR((NOT e)AND g)t1:=h+Σ1+ch+k[i]+w[i]                h:=g        g:=f        f:=e        e:=d+t1        d:=c        c:=b        b:=a        a:=t1+t2        
Message input w[i] for rounds 1 to 16 is the 32 bit×16=512 bit block of data. W[i] for rounds 17 to 64 must be derived. Constant K is specified for each round, the W[i]+K[i] value for each round can calculated ahead of the actual round iteration. Further detailed information concerning the SHA-2 specification can be found in Secure Hash Standard published by Federal Information Processing Standard Publication (FIPS PUB 180-3, published October, 2008).
Conventional software solutions using standard instructions require a separate instruction for each of the addition and logical shift/rotate instructions needed to implement the round and scheduling functions of the SHA-2 such as the SHA256 algorithm. Current industry benchmark data for SHA256 is in the 15 cycles per byte range. The limit for a standard instruction implementation of SHA256 potentially approaches the 9 cycle per byte range. There has been a lack of efficient ways to perform the above operations.