1. Field of the Invention
The present invention relates to a method and apparatus for the production of a computation parameter that is used in the implementation of modular operations according to the Montgomery method, enabling the performance of modular computations in a finite field denoted GF(2.sup.n), namely a Galois field with 2n elements, without carrying out any division.
2. Discussion of the Related Art
Conventionally, modular operations on GF(2.sup.n) are used in cryptography for applications such as the authentication of messages, the identification of a user and the exchange of keys. Such exemplary applications are described for example in the French patent application published under No. 2 679 054.
There are commercially available integrated circuits dedicated to such applications. These include, for example, the product referenced ST16CF54 manufactured by SGS-THOMSON MICROELECTRONICS S.A., built around a combination of devices including a central processing unit and an arithmetic coprocessor dedicated to the performance of modular computations. The coprocessor enables the processing of operations of modular multiplication by using the Montgomery method. The coprocessor is the object of a European patent application filed under the reference No. 0 601 907 A2, and illustrated in FIG. 2 of the present application (FIG. 2 of the present application corresponds to FIG. 2 of the aforementioned European patent application).
The Montgomery method enables the computation of a basic operation, called a p.sub.field operation. This basic operation consists of the production, on the basis of three binary data elements, A (multiplicand), B (multiplier) and N (modulo) encoded on a whole number of bits n, of a binary data element denoted P.sub.field (A, B).sub.N encoded on n bits, such that P.sub.field (A, B).sub.N =A*B*I mod N, with I=2.sup.-n mod N.
Conventionally, n is equal to 256, 512, and in the very near future it will be equal to 1024. To resolve the operation P.sub.field, a working base is used, namely some of the computations needed for the resolution are done on Bt bits instead of n bits, for example Bt=32. It is noted that there is an integer k such that k*Bt.gtoreq.n&gt;(k-1)*Bt. The use of a working base makes it possible to reduce the surface area of the computation circuits, especially the multiplier circuits, while at the same time maintaining a good processing speed.
To resolve the operation P.sub.field (A, B).sub.N =S, a much used algorithm uses the following variables:
A as an integer encoded on n bits and split up into k words of Bt bits denoted A.sub.0 to A.sub.k-1, PA1 B as an integer encoded on n bits and represented on one word of (k*Bt) bits, PA1 N as an odd-parity integer encoded on n bits and represented on one word of (k*Bt) bits. The reference N.sub.0 denotes the word of Bt bits corresponding to the Bt least significant bits of N, PA1 S(i) as an integer encoded on (k*Bt)+1 bits representing an updated value of the result S for one iteration i, PA1 X and Z as two integers encoded on ((k+1)*Bt)+1 bits, PA1 Y.sub.0 as an integer encoded on Bt bits, PA1 J.sub.0 as an integer encoded on Bt bits and such that ((J.sub.0 *N.sub.0)+1) mod 2.sup.Bt =0. PA1 zero-setting of S(0), PA1 the implementation of a loop indexed i, with i varying from 1 to k: PA1 B1: computation of X=S(i-1)+(B*A.sub.i-1), PA1 B2: computation of Y=(X*J.sub.0) mod 2.sup.Bt, where the operation mod 2.sup.Bt corresponds only to a truncation, PA1 B3: computation of Z=X+(Y*N), PA1 B4: computation of S(i)=Z.backslash.2.sup.Bt, where the symbol .backslash. represents an integer division, PA1 B5: if S(i) is greater than N then N is subtracted from S(i), PA1 and it is possible to recover the result S=S(k)=P.sub.field (A, B)N. PA1 E1: the loading of an information element N.sub.0 encoded on Bt bits, the least significant bit of N.sub.0 being equal to 1, in a first and third register of Bt bits, PA1 E2: the loading of a 1 into the most significant bit of a second register of Bt bits, PA1 E3: the implementation of a loop, indexed by i, i varying from 1 to Bt-1, each iteration comprising the following steps: PA1 E1: the loading of the number 1 represented on Bt bits into a first register of Bt bits, PA1 E2: the loading of an information element No encoded on Bt bits, the least significant bit of N.sub.0 being equal to 1, in a third register of Bt bits, PA1 E3: the implementation of a loop, indexed by i, i varying from 1 to Bt, each iteration comprising the following steps: PA1 E4: the second register contains the desired parameter J.sub.0.
The mathematical algorithm that can be transposed both to an integrated circuit and to a software processing operation is the following:
In cryptography, during the encoding (or decoding) of a message, this type of operation is used many times. It is therefore highly worthwhile to perform this algorithm at the highest possible speed. Furthermore, the time it takes to compute the parameter J.sub.0 is a factor in the overall computation time. At present, there are several known methods of theoretical computation for computing this parameter. For example there is an extended Euclidean algorithm and also an extended Stein algorithm. There is also a method using modular exponentiation and another method using a modular division. These methods are not directly realizable on integrated circuits and the computation of this parameter J.sub.0 is done conventionally by software means using an arithmetic coprocessor.