The present invention relates to the hardware implementation of a procedure known as "the interleaved Montgomery multiprecision modular multiplication method" often used in encryption software oriented systems. A unique original method is provided to accelerate modular exponentiation; and vital proofs are used to simplify the architecture and extend the use of the device to large number calculations in the normal field of numbers.
The basic process is one of the three published related methods for performing modular multiplication with Montgomery's methodology. P. L. Montgomery, "Modular multiplication without trial division", Mathematics of Computation, vol. 44, pp. 519-521, 1985!, hereinafter referred to as "Montgomery", S. R. Dusse and B. S. Kaliski Jr., "A cryptographic library for the Motorola DSP 56000", Proc Eurocrypt '90, Springer-Verlag, Berlin, 1990! hereinafter referred to as "Dusse".
In this hardware implementation, security mechanisms and "on the fly" additions, subtractions, and moves have been added; processes whose total output might be irrelevant have been removed; a relatively easy to implement on silicon type of design has been invented and has been integrated to be appended to the internal data/address bus as a slave to virtually any 8, 16 or 32 bit Central Processing Unit (CPU).
Because of the simple synchronized shift design, the multiplying/squaring machine can run at clock speeds several times faster than speeds presently attainable with CPU's which support on board non-volatile memory devices. This method demands no design changes in the memory architecture of the CPU as prescribed by implementations using parallel multipliers and dual ported memories for fast modular multiplication of large numbers as in the Philips circuit. Philips Components, "83C852, secured 8-bit microcontroller for contiditional access applications", Einhoven, August, 1990!, hereinafter referred to as "Philips".
The essential architecture is of a machine that can be integrated to any microcontroller design, mapped into memory; while working in parallel with the controller which must constantly load commands and operands, then unload and transmit the final answer.
The unique solution uses only two serial/parallel multipliers, and a complete serial pipelined approach that saves silicon area. Using present popular technologies, it enables the integration of the complete solution including a microcontroller with memories onto a 4 by 4.5 by 0.2 mm microelectronic circuit that can meet the ISO 7816 standards. International Organization for Standardization, "Identification cards-integrated circuit cards", ISO 7816:
Part 1-ISO 7816-1, "Physical characteristics", 1987. PA1 Part 2-ISO 7816-2, "Dimensions of locations of contacts", 1988. PA1 Part 3-ISO/IEC 7816-3, "Electronic signals & transmission protocols", 1989.! hereinafter referred to as "ISO 7816". PA1 1) X=A.multidot.B PA1 2) Y=(X.multidot.J) mod 2.sup.n (only the n LS bits are necessary) PA1 3) Z=X+Y.multidot.N PA1 4) S=Z/2.sup.n (The requirment on J is that it forces Z to be divisible by 2.sup.n) PA1 5) P.Yen.S mod N (N is to be subtracted from S, if S.gtoreq.N) PA1 (1) X=S(i-1)+A.sub.i-1 .multidot.B (A.sub.i-1 is the i-1 th character of A; S(i-1) is the value of S at the outset of the i'th iteration.) PA1 (2) Y.sub.0 =X.sub.0 .multidot.J.sub.0 mod 2.sup.k (The LS k bits ofthe product of X.sub.0 .multidot.J.sub.0) PA1 (3) Z=X+Y.sub.0 .multidot.N PA1 (4) S(i)=Z/2.sup.k (The k LS bits of Z are always 0, therefore Z is always divisible by 2.sup.k. This division is tantamount to a k bit right shift as the LS k bits of Z are all zeros; or as will be seen in the circuit, the LS k bits of Z are simply disregarded. PA1 (5) S(i)=S(i) mod N (N is to be subtracted from those S(i)'s which are larger than N ). PA1 n=24; K=8;t=0a f5 9b; q=2b 13; and PA1 R=I.sup.-1 =2.sup.24 mod q=141d. PA1 .SIGMA.=2.sup.q-1 +E mod 2.sup.q-1 PA1 and PA1 q is the number of revelant bits in E (disregard any leading zeros). PA1 B=(B.multidot.A)=A.sup.10 .multidot.I.sup.-1 .multidot.A.multidot.I=A.sup.11 PA1 C=B PA1 1-precalculating a parameter H and at least the least significant character J.sub.0 of another parameter J, as hereinafter defined, and loading J.sub.0 into a k bit register; PA1 2-loading the multiplier B and the modulus N into respective registers of n bit length, wherein n=m.multidot.k; PA1 3-setting an n-bit long register S to zero; and PA1 4-carrying out an i-iteration m times, wherein i is from zero to m-1, each ith iteration comprising the following operations: PA1 5) at the last (m th) iteration, ignoring the least significant character of Z/2.sup.k and entering the remaining characters into the B register, as the value of C.Yen.(A.multidot.B)N; PA1 6) repeating the steps 3) to 4), wherein C or C-N, if C is greater than N, is substituted for B and H is substituted for A, whereby to calculate P=(C.multidot.H) mod N; and PA1 7) assuming the value of obtained from the last iteration as the result of the operation A.multidot.B mod N. PA1 1) loading the modulus number into the aforesaid register N; PA1 2) setting the aforesaid register S to zero; PA1 3) loading the base A to be exponentiated into the aforesaid register B; PA1 4) storing the exponent E in a computer register; PA1 5) shifting said exponent E left; PA1 6) ignoring all the zero bits thereof which precede the first 1 bit and ignoring the first 1 bit of said exponent E, and for all the following bits performing the operations 7 to 9: PA1 7) for every one of said bits, regardless of its being 0 or 1, squaring the content of register B by the multiplication method hereinbefore set forth, wherein the successive characters of the base are loaded into register Ai from register B; PA1 8) if and only if the current bit of the exponent E is 1, multiplying, after performing operation 7), the content of register B by the base A; and PA1 9) after each Montgomery square or Montgomery multiply operation to perform a Montgomery C.multidot.H multiplication (C.multidot.H)N, and PA1 10) after performing steps 6-9 for all bits of E, storing the result of the last operation as D.Yen. A.sup.E mod N in register B. PA1 1) loading the modulus number into the aforesaid register N; PA1 2) setting the aforesaid register S to zero; PA1 3) loading the base A to be exponentiated into the aforesaid register B; PA1 4) storing the exponent E in a computer register, and a precalculated parameter T in the CPU memory; PA1 5) shifting said exponent E left; PA1 6) ignoring all the zero bits thereof which precede the first 1 bit and ignoring the first 1 bit of said exponent E, and for all the following bits performing the operations 7 to 8: PA1 7) for every one of said bits, regardless of its being 0 or 1, carrying out operations 4 and 5 of the multiplication method hereinbefore set forth, wherein both the multiplicand and the multiplier are the base A, and wherein the successive characters of the base are loaded into register A.sub.i from register B; PA1 8) if and only if the current bit of the exponent E is 1, carrying out, after performing operation 7), operations 4 and 5 of the multiplication method hereinbefore set forth, wherein the multiplicand is the content of register B and the multiplier is the base A; and PA1 9) after performing steps 7 and 8 for all bits of E, performing an additional Montgomery multiplication of register B by the parameter T (B.multidot.T)N), and then storing the result of the last operation as D .Yen. A.sup.E mod N in register B. Parameter T is defined as T=(2.sup.n).sup.S mod N, wherein S=2.sup.q-1 +E mod 2.sup.q-1, as explained in detail in the parent application. PA1 1) storing the exponent E in a computer register. PA1 2) loading the modulus number into the aforesaid register N; PA1 3) setting the aforesaid register S to zero; PA1 4) performing a multiplication operation, of A*=(A.multidot.H)N while A is the operand to be exponentiated, and H is a precalculated parameter as defined before. PA1 5) loading A* into the base register B. PA1 6) performing a squaring operation of the contents of register B. PA1 7) shifting said exponent E left; PA1 8) ignoring all the zero bits thereof which precede the first 1 bit and ignoring the first 1 bit of said exponent E, and for all the following bits performing the operations 9 to 10: PA1 9) for every one of said E bits, regardless of its being 0 or 1, carrying out operations 4 and 5 of the squaring method hereinbefore set forth, wherein both the multiplicand and the multiplier originate from the B register, and wherein the successive characters of the Montgomery multiplier are loaded into register A.sub.i from register B; PA1 10) if and only if the current bit of the exponent E is 1, carrying out, after performing operation 9, operations 4 and 5 of the multiplication method hereinbefore set forth, wherein the multiplicand is the content of register B and the multiplier is the base A*; and PA1 1) after performing steps 8-10 for all bits of E, performing an additional Montgomery multiplication of register B by the original base A and then storing the result of the last operation as D .Yen. A.sup.E mod N in register B if the exponent is odd; if the expronent were even, perform an additional Montgomery multiplication of D times 1: B.Yen.(D.multidot.1) .Yen. D.multidot.I It is seen that the exponentiation method of this invention eliminates the need for the computation of the parameter T, hereinbefore mentioned. PA1 an n bit shift register B for the multiplier; PA1 an n bit shift register N for the modulus; PA1 an n bit shift register for the value S as herein defined; PA1 a k bit register A.sub.i for the multiplicand; PA1 k bit register means for the values J.sub.0 and Y.sub.0 as herein defined; PA1 multiplier means for multiplying the content of the B register by that of the A.sub.i register; PA1 additional n-bit multiplier means; and PA1 adding, subtracting, multiplexing and delay means.
The invention is directed to the architecture of this solution, based on mathematical innovations, published by Montgomery, with several modifications and improvements and non-obvious methods are provided for reducing the time necessary for modular exponentiation to little more than half the time required using known processing and the Montgomery method.