Modular computations according to the Montgomery method are performed in a finite field, or Galois field, denoted as GF(2.sup.n). Conventionally, modular operations on GF(2.sup.n) are used in cryptography for applications, such as the authentication of messages, the identification of a user, and the exchange of cryptographic keys. Exemplary applications are described, for example, in the French Patent Application FR-A 2,679,054.
There are commercially available integrated circuits dedicated to such applications. These include, for example, the product referenced as ST16CF54, which is manufactured by SGS-THOMSON MICROELECTRONICS S.A. This product is built around a central processing unit and an arithmetic coprocessor, and is dedicated for performing modular computations. The coprocessor enables processing of modular multiplication operations using the Montgomery method, which is disclosed in U.S. Pat. No. 5,513,133.
The basic operation, called a P.sub.field operation, includes generation of a binary data element denoted as P(A, B).sub.N and encoded on n bits, such that P(A, B).sub.N =A*B*I mod N, with I=2.sup.-n mod N. The generation of the binary data element is based on three binary data elements A (multiplicand), B (multiplier) and N (modulus) encoded on a whole number of n bits. For this purpose, it is assumed that the data elements are encoded on m words of k bits, with m*k=n, and the words of A and B are provided to a multiplication circuit having a series input, a parallel input, and a series output.
For the coprocessor described in the referenced U.S. patent application, k=32 and m=8 or 16. FIG. 1 shows the modular arithmetic coprocessor disclosed in the referenced U.S. patent application. This coprocessor has the following elements. Three m*k bit shift registers 10, 11 and 12, including one series input and one series output. These shift registers 10-12 receive respectively the multiplier B, the result S and the modulus N. A multiplexer 13 with three series inputs includes one series output connected to the input of the register 10. A first input is connected to a first input terminal, and a second input is connected to the output of the register 10. A multiplexer 14 with two series inputs has one series output connected to the input of the register 11. A first input is connected to a logic 0.
The coprocessor further includes a multiplexer 15 having three series inputs and one series output connected to the input of the register 12. A first input is connected to a second input terminal, and a second input is connected to the output of the register 12. Three k-bit shift registers 16, 17 and 18 have one series input and one parallel output. These registers 16-18 receive respectively k bits of the multiplicand A, a computation parameter referenced J.sub.0, and an intermediate result referenced Y.sub.0. The input of the register 17 is connected to a third input terminal. Two multiplication circuits 19 and 20 each have a series input, a k-bit parallel input and a series output. Two k-bit registers 21 and 22 have a parallel input and a parallel output. The input of the register 21 is connected to the output of the register 16. The output of the register 21 is connected to the input of the multiplication circuit 19. The output of the register 22 is connected to the input of the multiplication circuit 20.
Furthermore, the coprocessor includes a multiplexer 23 with two parallel inputs and one parallel output. A first input of the multiplexer 23 is connected to the output of the register 17. A second input of the multiplexer 23 is connected to the output of the register 18. The output of the multiplexer 23 is connected to the input of the register 22. Two multiplexers 24, 25 each have two series inputs and one series output. The output of the multiplexer 24 is connected to the input of the register 16. A first input of the multiplexer 24 is connected to a fourth input terminal. The output of the multiplexer 25 is connected to the series input of the multiplication circuit 19. A first input of the multiplexer 25 is connected to a logic 0.
A multiplexer 26 has three series inputs and one output. The output is connected to the series input of the multiplication circuit 20, and a first input is connected to a logic 0. Three subtraction circuits 27, 28 and 29 each include two series inputs and one series output. The first input of the circuit 27 is connected to the output of the register 10. The output of the circuit 27 is connected to each of the second inputs of the multiplexers 24 and 25 and also to an output terminal. The first input of the circuit 28 is connected to the output of the register 11. Two addition circuits 30 and 31 each have two series inputs and one series output. The first input of the circuit 30 is connected to the output of the circuit 28. The second input of the circuit 30 is connected to the output of the circuit 19. The output of the circuit 30 is connected to a second input of the multiplexer 26. The output of the circuit is connected to a first input of the circuit 29, and to a second input of the multiplexer 14, and to each of the third inputs of the multiplexers 13 and 15.
Three delay cells 32, 33 and 34, which are actually k-bit shift registers, have a series input and a series output. The output of the cell 32 is connected firstly to a third input of the multiplexer 26 and secondly to the input of the cell 33. The output of the cell 33 is connected to a second input of the circuit 29. The input of the cell 34 is connected to the output of the circuit 30. The output of the cell 34 is connected to a first input of the circuit 31. A comparison circuit 35 has two series inputs and two outputs. A first input is connected to the output of the circuit 31. A second input is connected to the output of the circuit 29.
Two multiplexers 36 and 37 each have two series inputs, one selection input, and one output. Each of the first series inputs is connected to a logic 0. Each of the selection inputs is connected to one of the outputs of the circuit 35. The output of the multiplexer 36 is connected to a second input of the circuit 27. The output of the multiplexer 37 is connected to a second input of the circuit 28. A multiplexer 38 has two inputs and one output. A first input is connected to a logic 1. A second input is connected to the output of the register 12. The output is connected firstly to the input of the cell 32, and secondly to the second inputs of the multiplexers 36 and 37. A demultiplexer 39 has one input and two outputs. The input is connected to the output of the circuit 20. A first output is connected to the input of the register 18. A second output is connected to a second input of the circuit 31.
For further details on forming certain elements, reference may be made to the previously referenced U.S. patent. To carry out an elementary operation known as a P.sub.Field operation of the type P.sub.Field (A, B).sub.N =A*B*I mod N, with A and B encoded on m words of k bits, and I is an error equal to 2.sup.-m*k, iteration of the following loop is performed m times with i as an index varying from 1 to m: EQU X=S(i)+A.sub.i-1 *B, EQU Y.sub.0 =(X*J.sub.0)mod2.sup.k, EQU Z=X+(N*Y.sub.0),
S(i)=Z.backslash.2.sup.k.backslash. is an integer division, if S(i) is greater than N, then N is subtracted from S at the next iteration, with S(0)=0, A.sub.i is the k bit word with the significance i, J.sub.0 is a k bit word defined by the equation ((N*J.sub.0)+1) mod 2.sup.k =0.
The coprocessor of FIG. 1 enables the performance of a full iteration by a simultaneous shift of m*k bits of the registers 10-12 respectively containing B, S(i-1) and N. This is followed by a 2*k bit shift of the register 12 to store S(i). The word A.sub.i is loaded into the register 21 and the word J.sub.0 is loaded into the register 17. To perform the full computation of P.sub.Field (A, B).sub.N, it is enough to repeat each iteration m times by changing the word A.sub.i contained in the register 21 during each iteration.
The operation X=S(i-1)+A.sub.i *B is performed by the multiplication circuit 19 and the addition circuit 30. The operation Y.sub.0 +(X*J.sub.0) mod 2.sup.k is performed during the k first shifts in the multiplication circuit 20. Care is taken to store J.sub.o in the register 22. The result Y.sub.0 is stored in the register 18. The operation Z=X+(N*Y.sub.0), with N and X having been delayed by k bits in the delay cells 32 and 34 and Y.sub.0 having been placed in the register 22, is performed by the multiplication circuit 20 and addition circuit 31. The operation S(i)=Z.backslash.2.sup.k is performed by a k bit shift The comparison of S(i) with N is performed by the subtraction of N from S(i) in the subtraction circuit 29. N is delayed by k bits in the cell 33. Any overflow is detected and stored in the comparison circuit 35 to find out the result of the comparison. The subtraction of N from S(i) is done during the following iteration in the subtraction circuit 28.
The coprocessor described in the referenced U.S. patent has the drawback of carrying out computations in a fixed manner on a number either of 256 bits or of 512 bits. A first improvement disclosed in U.S. Pat. No. 5,745,398 seeks to provide greater flexibility in use by enabling the performance of P.sub.field (A, B).sub.N operations with A having a variable size.
A second improvement disclosed in a referenced European Patent Application EP-A 784,262 seeks to reduce the exchanges of data between the coprocessor and external to the coprocessor during the performance of modular operations. This is achieved by the addition of an additional register of m*k bits to store A integrally when A has a size smaller than m*k bits.
Furthermore, there are known m*k bit shift registers organized in banks of m k-bit sub-registers for reducing the use of the registers. The use is reduced for, at most, by 2*k storage cells which are used simultaneously. For a more detailed disclosure on these registers, those skilled in the art are directed to the French Patent Application filed on Jul. 4, 1997, under number 9,708,516. This French Patent Application corresponds to the U.S. Patent Application, filed Jun. 26, 1997 having Ser. No. 09/105,560. One drawback of the registers organized in banks of sub-registers is that they occupy a greater silicon surface area of an integrated circuit than standard registers.