The present invention relates to implementation of the MD5 Message Digest (“MD5”) algorithm.
The MD5 algorithm is intended for digital signature applications, where a data message must be signed in a secure manner before being sent. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit “fingerprint” or “message digest” of the input. It is assumed that it is computationally infeasible to produce two messages having the same message digest, or to produce any two messages having a given pre-specified target message digest.
The MD5 algorithm is optimized for 32-bit processors. In addition, the MD5 algorithm does not require any large substitution tables; the algorithm can be coded quite compactly.
Refer to FIG. 1 which illustrates a flowchart of the algorithm. It is assumed that the input is a b-bit message and the objective is to obtain the message digest of the input. Here, b is an arbitrary non-negative integer. It is also possible that b may be zero. The bits of the message are as follows:
m0, m1 . . . m(b−1) 
In the first step 102, the message is padded (i.e. extended) so that the length in bits is congruent to 448, modulo 512. In other words, the message is extended so that the message is just 64 bits shy of being a multiple of 512 bits long. Typically padding is performed even if the length of the message is already congruent to 448 modulo 512. Padding is performed as follows: a single “1” bit is appended to the message, and then “0” bits are appended so that the length in bits of the padded message becomes congruent to 448, modulo 512. In all, at least one bit and at most 512 bits are appended.
In the second step 104, a 64 bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step. In the unlikely even that b is greater than 264, then only the low-order 64 bits of b are used. (These bits are appended as two 32 bit words and appended low-order word first in accordance with previous conventions). At this point the resulting message (after padding with bits and with b) has a length that is the exact multiple of 512 bits. Equivalently, this message has a length that is an exact multiple of 16 (32-bit) words. Let M[0 . . . N−1] denote the words of the resulting message, where N is a multiple of 16.
A four word buffer [A, B, C, D] is used to compute the message digest. Here each of A, B, C, D is a 32 bit register. In step 106 these registers are initialized to the following values in hexadecimal, low order bytes first:
word A: 01 23 45 67
word B: 89 ab cd ef
word C: fe dc ba 98
word D: 76 54 32 10
In step 108, the message is processed in 16-word blocks. Note the following definitions: Let “+” denote addition of words (i.e. modulo-232 addition). Let X<<<s denote the 32 bit value obtained by circularly shifting (rotating) X left by s bit positions. Let not(X) denote the bit-wise complement of X and let X v Y denote the bit-wise OR of X and Y. Let X xor Y denote the bit-wise XOR of X and Y and let XY denote the bit-wise AND of X and Y.
Four auxiliary functions are defined, each of which takes as input three 32-bit words and produces as output one 32-bit word:
F(X,Y,Z) XY v not(X) Z
G(X,Y,Z)=XZ v Y not(Z)
H(X,Y,Z)=X xor Y xor Z
I(X,Y,Z)=Y xor (X v not(Z)).
In each bit position F acts as a conditional: i.e. if X then Y else Z. The function F could have been defined using +instead of v since XY and not(X) Z will never have 1's in the same bit position. If the bits of X, Y, and Z are independent and unbiased, then each bit of F(X,Y, Z) will be independent and unbiased.
The functions G, H and I are similar to the function F in that these functions act in “bitwise parallel” to produce the output from the bits of X, Y and Z in such a manner that if the corresponding bits of X,Y and Z are independent and unbiased, then each bit of G(X,Y, Z), H(X,Y, Z) and I(X,Y,Z) will be independent and unbiased. Note that the function H is the bit-wise “xor” or “parity” of its inputs.
Step 108 uses a 64 element table T[1 . . . 64] constructed from the sine function. Let T[i] denote the i-th element of the table, which is equal to the integer part of 4294967296 times abs (sin (i)), where i is in radians. Pseudo code for step 108 is presented here:
/*Process each 16-word block*/
For i=0 to N/16-1 do                /*copy block i into X. */        For j=0 to 15 do                    Set X[j] to M[i*16+j].                        end /*end of loop on j*/        /* Save A as AA, B as BB, C as CC, and D as DD.*/        AA=A        BB=B        CC=C        DD=D        /* Round 1*/        /* let FF[abcd k s i] denote the operation        a=b+((a+F(b,c,d)+X[k]+T[i]<<<s)*/        /* Do the following 16 operations. */        FF[ABCD 0 7 1] FF[DABC 1 12 2] FF[CDAB 2 17 3] FF[BCDA 3 22 4]        FF[ABCD 4 7 5] FF[DABC 5 12 6] FF[CDAB 6 17 7] FF[BCDA 7 22 8]        FF[ABCD 8 7 9] FF[DABC 9 12 10] FF[CDAB 10 17 11]        FF[BCDA 11 22 12]        FF[ABCD 12 7 13] FF[DABC 13 12 14] FF[CDAB 14 17 15]        FF[BCDA 15 22 16]        /*Round 2*/        /* let GG[abcd k s i] denote the operation        a=b+((a+G(b,c,d)+X[k]+T[i]<<<s)*/        /* Do the following 16 operations. */        GG[ABCD 1 5 17] GG[DABC 6 9 18] GG[CDAB 11 14 19]        GG[BCDA 0 20 20]        GG[ABCD 5 5 21] GG[DABC 10 9 22] GG [CDAB 15 14        23] GG[BCDA 4 20 24]        GG[ABCD 9 5 25] GG [DABC 14 9 26] GG[CDAB 3 14 27]        GG[BCDA 8 20 28]        GG[ABCD 13 5 29] GG[DABC 2 9 30] GG[CDAB 7 14 31]        GG[BCDA 12 20 32]        /*Round 3*/        /* let HH[abcd k s i] denote the operation        a=b+((a+H(b,c,d)+X[k]+T[i]<<<s)*/        /* Do the following 16 operations. */        HH[ABCD 5 4 33] HH[DABC 8 11 34] HH[CDAB 11 16 35]        HH[BCDA 14 23 36]        HH[ABCD 1 4 37] HH[DABC 4 11 38] HH[CDAB 7 16 39]        HH[BCDA 10 23 40]        HH[ABCD 13 4 41] HH[DABC 0 11 42] HH[CDAB 3 16 43]        HH[BCDA 6 23 44]        HH[ABCD 9 4 45] HH[DABC 12 11 46] HH[CDAB 15 16 47]        HH[BCDA 2 23 48]        /*Round 4*/        /* let II[abcd k s i] denote the operation        a=b+((a+I(b,c,d)+X[k]+T[i]<<<s)*/        /* Do the following 16 operations. */        II[ABCD 0 6 49] II[DABC 7 10 50] II[CDAB 14 15 51]        II[BCDA 5 21 52]        II[ABCD 12 6 53] II[DABC 3 10 54] II [CDAB 10 15 55]        II[BCDA 1 21 56]        II[ABCD 8 6 57] II [DABC 15 10 58] II[CDAB 6 15 59]        II[BCDA 13 21 60]        II[ABCD 4 6 61] II[DABC 11 10 62] II[CDAB 2 15 63]        II[BCDA 9 21 64]        /* Then perform the following addition. (That is increment each of the four registers by the value the register had before the block was started.)*/        A=A+AA        B=B+BB        C=C+CC        D=D+DD        end /*end of loop on i */        
To understand better the scope of each operation, analyzed below is the first operation of round 1: FF[ABCD 0 7 1].
The call procedure for the operation involves copying the seven elements from read only memory ROM and/or random access memory RAM to a RAM stack and/or register and calling the operation. An example of pseudo code including pushing the elements to a RAM stack is listed below:
push A
push B
push C
push D
push X[0]
push 7
push T[1]
call operation FF
pop all
Pushing an element involves copying the element from ROM or RAM to a RAM stack. (The term “loading” is generally used to denote copying the element from ROM or RAM to a register). Each variable A, B, C, D, and X includes four bytes. The constant ‘7’ includes one byte and the constant T[1] includes four bytes. Therefore in total 25 bytes are copied to the RAM stack or registers for each operation (recall that the four rounds include a total of 64 operations).
In step 110 the message digest produced as output is A, B, C, D. That is, the digest begins with the low-order byte of A and ends with the high order byte of D.
What is needed in the art is a system and method to efficiently implement the MD5 algorithm using an 8 bit micro-controller.