Embodiments of the invention relate to the Montgomery algorithm to execute a modular multiplication, also called Montgomery multiplication, and in particular to an electronic circuit optimized to execute the product.
The Montgomery algorithm is described in “Modular Multiplication Without Trial Division”, Peter Montgomery, Mathematics of Computation, vol. 44, pp. 519-521, 1985. According to the Montgomery algorithm, when it is applied to binary numbers, the following value needs to be calculated:R=XY·2−L mod M  (1)
Where R, X, Y and M are L-bit numbers. M is the modulo; it is odd and higher than X and Y.
This calculation is often used to execute modular exponentiations in encryption, where numbers are “big”, i.e., the L-bit number is of several hundred, for example 512. The algorithm then allows calculations to be simplified by limiting the number of divisions and limiting the size of the intermediate multiplication results to the sizes of multiplicands.
The calculation may be implemented by bit-to-bit iterations according to the following loop:
R:= 0(2)for i between 0 and L−1 execute:  m[i]:= (R + XY[i]) mod 2  R:= (R + XY[i] + m[i]M)/2end for
The index [i] refers to the bit of weight i. In the first step of the iteration, a bit m[i] is determined whose parity is equal to that of the sum R+XY[i]. Thus, in the second step, as M is odd, the sum R+XY[i]+m[i]M is even; its division by 2 (right shift) decreases the size of the result by 1 bit without losing significant bits. Normally, the sum of three L-bit numbers is a (L+2)-bit number maximum. As the module M is never chosen at the maximum value (2L−1), and X and Y are each lower than M, the sum does not exceed L+1 bits, whereby each iteration supplies an L-bit number R.
The value R searched in the relationship (1) is supplied by the last iteration.
Various other iterative methods are described in Analyzing and Comparing Montgomery Multiplication Algorithms, etin Kaya Koø et al., IEEE Micro, 16(3): 26-33, June 1996.
These methods are a starting point to implement the calculations on a microprocessor, possibly assisted by an accelerator dedicated to Montgomery multiplication.
A difficulty encountered to implement these calculations on a microprocessor results from the fact that the size of the numbers is too large (for example 512 bits) in relation to the computing capacity of the microprocessor (for example 64 bits). The result is that each elementary operation is executed by accumulating partial results coming from the decomposition of large numbers into slices, the sizes of which are adapted to the microprocessor capacity. Such a slice will be hereinafter referred to as “digit” to make an analogy with the multiplication of decimal numbers.
U.S. Pat. Nos. 6,185,596 and 7,543,011 describe Montgomery multiplication circuits tending to optimize the computing time and current consumption (indeed, Montgomery multiplications being used in encryption, in particular in contactless chip cards, there is also a need to minimize the consumption and, correlatively, the size).
It is therefore desired to minimize the product (response time)×(size) of a Montgomery multiplication circuit.