The invention relates to a device and method for the implementation of an elementary modular operation according to the Montgomery method. This method can be used to perform modular computations in a finite field (or Galois field) without performing divisions.
Modular operations in finite fields are used in cryptography for applications such as the authentication of messages, the identification of a user and the exchange of keys. Exemplary applications of this kind are described, for example, in the European patent application FR-A-2,679,054.
There are commercially available integrated circuits dedicated to such applications, such as the product referenced ST16CF54, which is manufactured by STMicroelectronics S.A., the current assignee of the present invention. This product is built around a central processing unit and an arithmetic coprocessor for implementing modular computations. The coprocessor enables the processing of the modular multiplications by using the Montgomery method, which is the object of the European patent application EP-A-601,907.
The basic operation, called a Pfield operation, includes the use of three binary data elements A (multiplicand), B (multiplier lower than N) and N (modulo) encoded on an integer number of n bits to produce a binary data element referenced P(A, B)N encoded on n bits such that P(A, B)N=A*B*I mod N, with I=2xe2x88x92n mod N. For this purpose, it is assumed that the data elements are encoded on m words of k bits with m*k=n, and the words A and B are given to a multiplication circuit having a serial input, a parallel input and a series output.
For the coprocessor described in the above referenced European patent application EP-A-601,907, we have k=32 and m=8 or 16. FIG. 1 shows the modular arithmetic coprocessor disclosed by the referenced patent application. This coprocessor includes the following elements:
three shift registers 10, 11 and 12, with m*k bits, designed to receive respectively the multiplier B, the result S and the modulo N;
multiplexers 13 to 15 that are respectively connected to the inputs of the registers 10 to 12;
three k-bit shift registers 16, 17 and 18 having one series input and one parallel output, designed to receive respectively k bits of the multiplicand A, a computation parameter referenced J0, an intermediate result referenced Y0;
two multiplication circuits 19 and 20 each having one series input, one parallel k-bit input and one series output;
two k-bit parallel latches 21 and 22 used as a buffer for the multiplication circuits 19 and 20;
a multiplexer 23 used to connect the latch 22 either to the register 17 or to the register 18;
three multiplexers 24, 25 and 26 used to route the data elements to the inputs of the multiplication circuits 19 and 20;
three subtraction circuits 27, 28 and 29 each comprising two series inputs and one series output;
two addition circuits 30 and 31, each having two series inputs and one series output;
three delay cells 32, 33 and 34 that are actually k-bit shift registers and are used to delay the data elements by k clock cycles to mask the computation time of the multiplication circuits 19 and 20;
a comparison circuit 35;
two multiplexers 36 and 37 used to control the subtraction circuits 27 and 28;
a multiplexer 38; and
a demultiplexer 39.
For further details on the making of these elements, reference may be made to the above referenced European patent application EP-A-601,907.
To perform an elementary operation called a Pfield operation of the Pfield(A, B)N=A*B*I mod N type, A and B are encoded on a number of m k-bit words and I is an error equal to 2xe2x88x92m*k, and the iteration of the next loop is performed m times with i being an index varying from 0 to mxe2x88x921:
X=S(ixe2x88x921)+Ai*B;
Y0=(X*J0) mod 2k;
Z=X+(N*Y0);
S(i)=Z/2k (an integer division);
if S(i) is greater than N, then N is subtracted from S(i) before the next iteration;
with S(xe2x88x921)xe2x88x920, Ai is the k-bit word with a place value i, and J0 is a k-bit word defined by the equation ((J*Y0 )+1) mod 2k=0.
The coprocessor of FIG. 1 enables the performance of a full iteration by a simultaneous shift, by m*k bits, of the registers 10 to 12 respectively containing B, S(ixe2x88x921) and N followed by a shift, by 2*k bits, of the register 12 to store S(i). The word Ai is loaded into the register 21 and the word J0 is loaded into the register 17. To perform the full computation of Pfield(A, B)N, it is enough to repeat each iteration m times by changing the word Ai contained in the register 21 during each iteration.
The operation X=S(ixe2x88x921)+Ai*B is done by the multiplication circuit 19 and the addition circuit 30. The operation Y0 =(X*J0) mod 2k is done, during the k first shifts, in the multiplication circuit 20 while storing J0 in the register 22 and storing the result Y0 in the register 18. The operation Z=X+(N+Y0), with N and X having been delayed by k bits in the delay cells 32 and 34 and with Y0 having been put into the latch 22, is performed by the multiplication circuit 20 and the addition circuit 31. The operation S(i)=Z/2k is done by a k-bit shift. The comparison of S(i) with N is done by the subtraction of N from S(i) in the subtraction circuit 29. N is delayed by k bits in the delay cell 33, and a possible overflow is detected and stored in the comparison circuit 35 to find out the result of the comparison. The subtraction of N from S(i) is done during the next iteration in the subtraction circuit 28.
Many improvements have been made in this circuit. The improvements are aimed at obtaining higher speeds, reducing the size of the circuit, reducing the consumption of the circuit, and/or providing additional functions without considerably increasing the size of the circuit. Those skilled in the art may refer to the publications of the European patent applications EP-712,070, EP-712,071, EP-712,072, EP-778,518, EP-784,262, EP-785,502, EP-785,503, EP-793,165, EP-853,275, and also to the publication of the international patent application WO/97-25,668.
There is also another circuit known from the publication of the European patent application EP-566, 498 enabling the computation of the elementary operation P(A, B)N=A*B*I mod N, with I=2xe2x88x92n and n is the size of A, B or N. This circuit uses a single parallel/series multiplication circuit, in the form of a parallel adder coupled with a shift register.
The circuit does not produce exactly the Montgomery algorithm and uses an intermediate data element equal to Nxe2x88x921)/2+1. The circuit uses a multiplication circuit having a parallel input with n bits and is limited to computation operands with a permanently fixed size. Furthermore, the size of the circuit disclosed in the European patent application EP-566,498 is proportional to the size of the operands used. Consequently, the surface area thus occupied by the circuit is considerable.
The present invention is aimed at improving the prior art by providing a coprocessor that uses a single multiplication circuit coupled to a computation circuit dedicated to the computation of Y0, with Y0=(X*J0) mod 2k and J0 being defined by the equation ((N*J0)+1) mod 2k=0. The invention also provides a method for the computation of a modular operation using the circuit for the computation of Y0.
An object of the invention is to provide an integrated circuit comprising a modular arithmetic coprocessor comprising:
storage means to store and provide, in series, first and second operands A and B, a modulo N and a result S with A as an integer encoded on a*k bits, a is a non-zero integer at most equal to m, and B, N and S are integers encoded on at most m*k bits, m and k are integers greater than 1;
computation means to perform modular operations according to the Montgomery method, wherein the computation means comprises a first k-bit latch to store a k-bit word Ai of A, and a second k-bit latch to store either the least significant word of N or an intermediate data element Y0 encoded on k bits such that Y0=((S(ixe2x88x921)+(Ai*B))*J0) mod 2k, with i as a loop index varying from 0 to axe2x88x921, S(ixe2x88x921) as an updated result of S during the (ixe2x88x921)th iteration, S(xe2x88x921) is equal to zero, Ai is the ith k-bit word of A, and J0 is a k-bit word for the equation ((J0*N)+1) mod 2k=0;
an addition means to add up the contents of the first and second latches;
a selection means coupled to the outputs of the first and second latches and to the addition means in order to give, at a parallel output, either the word contained in the first latch or the word contained in the second latch, or the sum of the words contained in the first and second latches, or the word zero, first as a function of a bit of B, and second as a function of a bit of N;
an accumulator circuit that adds up, shifts by one bit and stores the words given successively by the selection device with one bit of an updated result S(i), the bit output from the accumulator circuit becoming a new updated result; and
a circuit to compute an intermediate data element Y0 connected, first, to the output of the second latch to receive the least significant k-bit word of N and, second, to the output of the accumulator to receive a data element X=S(ixe2x88x921)+Ai+B.
Preferably, the circuit to compute the data element Y0 comprises a kxe2x88x921 bit shift storage register that stores the data output from the computation circuit; a multiplication circuit to multiply the contents of the storage register by the contents of the second latch apart from the least significant bit contained in the second latch and provide a result bit in series; and a subtraction circuit for the bit-by-bit subtraction of the result output from the multiplication circuit from the result output from the accumulator.
Another object of the invention is to provide a method for performing a modular operation according to the Montgomery method by the series shifting of the first and second operands A and B, an N modulo and an updated result through computation means with A as an integer encoded on a*b bits. The variable a is a non-zero integer at most equal to m and B, N and S are integers encoded on at most m*k bits, and m and k are integers greater than 1. An intermediate data element Y0 is computed such that Y0=((S(ixe2x88x921)+(Ai*B))*J0) mod 2k in an iterative loop indexed by i, with i varying from 0 to axe2x88x921 and with S(ixe2x88x921) corresponding to the (ixe2x88x921)th updated result. S(xe2x88x921) is equal to 0, Ai is the ith k-bit word of A, and J0 is a k-bit word resolving the equation ((J0*N)+1) mod 2k=0, wherein Y0 is computed in a computation circuit that gives Y0 bit by bit, first, from a word No of k least significant bits of N and, second, from an intermediate data element X=S(ixe2x88x921)+Ai*B.
Preferably the computation of Y0 comprises the following steps of loading in a shift register of the least significant bit of X, with this bit being equal to the least significant bit of Y0; and multiplying in a multiplication circuit the kxe2x88x921 most significant bits of N0 by the kxe2x88x921 least significant bits of Y0 by the shifting of Y0 in the shift register; and subtracting bit-by-bit in a subtraction circuit the result output from the multiplication circuit from the kxe2x88x921 most significant bits of the least significant word of the data element X, with the output result bit being a bit of Y0 that is stored in the shift register.
According to one embodiment, the method furthermore comprises the repetition of the following steps, with i being an index varying from 0 to axe2x88x921:
storing a k-bit word Ai corresponding to a word i of A in a first k-bit latch;
computing an intermediate data element Y0 such that Y0=((S(ixe2x88x921)+(Ai*B))*J0) mod 2k, with S(ixe2x88x921) corresponding to the (ixe2x88x921)th updated result, S(xe2x88x921) is equal to 0 and J0 is a k-bit word resolving the equation ((J0*N)+1) mod 2k=0;
storing the least significant k-bit word of N and then Y0 in a second k-bit latch;
adding in a parallel addition circuit the words contained in the first and second latches;
selecting and supplying either the word contained in the first latch or the word contained in the second latch, or the sum of the words contained in the first and second latches, or zero as a function, first, of a bit of B and, second, either of a bit of Y0 or of a bit of N; and
successive adding in an accumulator circuit of the words given by the selection device for each pair of bits of B and N, with the result of each addition being added to a bit of the previous updated result S(ixe2x88x921) and then shifting by one bit and storing between each addition, with the bit output from the accumulator during the shift corresponding to a new updated result S(i).