SHA stands for Secure Hash Algorithm. It consists of five hash functions designed by the National Security Agency (NSA) and published by the National Institute of Standards and Technology (NIST). Amongst all, SHA-1 is most popular one. SHA-1 produces a 160-bit message digest for a message having a maximum length of 264. A message digest is a fixed-length output of a message. The message digest is then input to a digital signature algorithm (DSA), which will then generate the signature for the message. Signing the message digest instead of the message offers improved performance because the message digest will be much smaller than the message. The recipient of the message will then use the same hash algorithm to verify the signature. Any change that occurs during transit will result in a different message digest and, thus, the signature will not verify. Once it is verified as true, the recipient is able to unlock the message. This method prevents unauthorized users from viewing messages that are not intended for them.
When computing a message digest, SHA-1 processes blocks of 512 bits. The total length of the message digest will be a multiple of 512. FIG. 1 is a block diagram illustrating a typical iteration of the SHA-1 operations. FIGS. 2A and 2B show functions and constants respectively used during rounds of SHA-1 operations. Processing a 512 bit/64 byte block of data with SHA-1 hash algorithm consists of performing 80 rounds (repetitions) of the round algorithm. For each round a message input of 32 bit is required, where the 512 bits of the block being hashed is used directly for the first 16 rounds of message data input, and the message inputs for rounds 17 to 80 are derived by combining previous message inputs according to a “message scheduling” function specified by the SHA-1 standard.
Specifically, according to the SHA-1 standard, a message digest is computed using padded message. The computation uses two buffers, each consisting of five 32-bit words, and a sequence of eighty 32-bit words. The words of the first 5-word buffer are labeled A, B, C, D, and E. The words of the second 5-word buffer are labeled H0, H1, H2, H3, and H4. The words of the 80-word sequence are labeled W0, W1, . . . , W79. A single word buffer TEMP is also employed. To generate the message digest, the 16-word blocks M1, M2, . . . , Mn defined in the standard are processed in order. The processing of each Mi involves 80 steps. Before processing any blocks, the {Hi} are initialized as follows: H0=0x67452301; H1=0xEFCDAB89; H2=0x98BADCFE; H3=0x10325476; and H4=0xC3D2E1F0.
M1, M2, Mn are then processed. To process Mi, following operations are performed:                a). Divide Mi into 16 words W0, W1, . . . , W15, where W0 is the left-most word.        b). For t=16 to 79 let Wt=S1(Wt-3 XOR Wt-8 XOR Wt-14 XOR Wt-16).        c). Let A=H0, B=H1, C=H2, D=H3, E=H4.        d). For t=0 to 79 do                    TEMP=S5(A)+ft(B,C,D)+E+Wt+Kt;            E=D; D=C; C=S30(B); B=A; A=TEMP;                        e). Let H0=H0+A, H1=H1+B, H2=H2+C, H3=H3+D, H4=H4+E.        
After processing Mn, the message digest is the 160-bit string represented by the 5 words H0, H1, H2, H3, and H4.
Alternatively, according to the SHA-1 standard, the above assumes that the sequence W0, . . . , W79 is implemented as an array of eighty 32-bit words. This is efficient from the standpoint of minimization of execution time, since the addresses of Wt-3, . . . , Wt-16 in step (b) are easily computed. If space is at a premium, an alternative is to regard {Wt} as a circular queue, which may be implemented using an array of sixteen 32-bit words W[0], . . . W[15]. In this case, let MASK=0x0000000F, then processing of Mi is as follows:                a). Divide Mi into 16 words W[0], . . . , W[15], where W[0] is the left-most word.        b). Let A=H0, B=H1, C=H2, D=H3, E=H4.        c). For t=0 to 79 do                    s=t^MASK;            if (t>=16) W[s]=S1(W[(s+13)^MASK] XOR W[(s+8) AND MASK] XOR W[(s+2)^MASK] XOR W[s]);            TEMP=S5(A)+ft(B,C,D)+E+W[s]+Kt;            E=D; D=C; C=S30(B); B=A; A=TEMP;                        d). Let H0=H0+A, H1=H1+B, H2=H2+C, H3=H3+D, H4=H4+E.        
Further detailed information concerning the SHA-1 specification can be found in Secure Hash Standard published by Federal Information Processing Standard Publication (FIPS PUB 180—1995 Apr. 17).
Conventional software solutions utilize standard 32-bit instructions and 32-bit register/memory storage. The round calculation requires four 32-bit additions, two 32-bit rotates, logic functions and moves. Each message input for rounds 17 to 80 requires rotate and 3 exclusive ORs (XORs). With four 32-bit additions, 3 rotates, and several logical functions for each of the 80 round/message passes, even with a multiple execution unit processor several cycles are required to process around. There has been a lack of efficient ways to perform the above operations.