A cryptographic system is a computer system that uses cryptography, typically to secure or authenticate data communication between a pair of computing devices connected to one another through a data communication link in the system. Each computing device has a cryptographic unit with the processing capacity to implement one or more cryptographic protocols used to secure or authenticate the data communication. The cryptographic protocols typically perform arithmetic operations on the bit strings representing parameters in the protocols to produce a bit string representing the output from the protocol.
Computing devices in a cryptographic system are often required to perform arithmetic operations in which modular arithmetic is necessary. For example, a computing device may be required to multiply two integers modulo some n. The classical approach to performing this operation is to first perform the multiplication of the integers and then divide the product by the modulus n. The remainder from the division represents the modular reduction. However, performing the modular reduction of an integer by dividing the integer by the modulus n to obtain the remainder can be relatively computationally expensive. Therefore, other modular reduction techniques have been developed that attempt to increase the computational efficiency of modular reduction.
One such technique is the method of Montgomery modular reduction, referred to as Montgomery reduction for short. Montgomery reduction is known in the art and is discussed in detail, for example, in section 14.3.2 of the Handbook of Applied Cryptography, Menezes et al., CRC Press, 1997. Montgomery reduction benefits from the fact that steps of multiplication and shifting are generally faster than division on most computing machines. Montgomery reduction also relies on performing certain precomputations, and by doing so many calculations can be done faster. Also, as opposed to classical methods of reduction-from-above, such as Euclidean division, Montgomery reduction reduces from below, that is, the method proceeds by clearing the least-significant portions of the unreduced quantity, leaving the remainder in the upper portion, and therefore benefits from excluding carries that may otherwise interfere with the already cleared portion.
In Montgomery reduction, calculations with respect to a modulus n are carried out with the aid of an auxiliary number R called the Montgomery radix or base. R is chosen such that R>n and such that the greatest common divisor of R and n is one, i.e. gcd(R,n)=1. When the modulus n is an odd (often prime) number, a good choice of R is typically the first convenient power of two larger than the modulus n; i.e., R=2r, where r is an integer chosen such that R is the first convenient power of two greater than the modulus n. The Montgomery reduction of a number T is the quantity given by computing TR−1 mod n. This computation requires the values T, R, n, and μ=(−n)−1 mod 2w, where w is an integer, typically representing the bit size of a word (or block) of the value being operated on. The value μ is used to effect the Montgomery reduction. A summary of Montgomery reduction follows.
A computational engine performing Montgomery reduction receives as an input the modulus n, precomputed values R=2r and μ, and the integer Ton which Montgomery reduction is to be performed. For Montgomery reduction to operate correctly, the property must hold that gcd(n,R)=1 and T<nR. The computational engine performs the following computations to obtain the value TR−1 mod n:
1. A←T (Notation: A=(a2d−1 . . . a1a0)b where b=2w and d is the number of words of the modulus n—note that d=r/w).
2. For i=0 to d−1 do the following:                2.1 ui←aiμ mod b        2.2 A←A+uinbi         
3. A←A/bd 
4. If then A←A−n
5. Return (A).
The value A returned equals TR−1 mod n.
It is noted that sometimes the final reduction (step 4) in Montgomery reduction is omitted, for example, to counter side channel attacks if the modulus n is secret. In such a scenario, the value returned TR−1 is not fully reduced mod n, but is equivalent to the fully reduced value (mod n). That is, the output of the Montgomery reduction is a value that is congruent to TR−1 mod n modulo n.
The technique of Montgomery multiplication is also known in the art and is described, for example, in section 14.3.2 of the Handbook of Applied Cryptography, Menezes et al., CRC Press, 1997. The Montgomery multiplication of two numbers a and b is the Montgomery reduction of their product, computed as ab=abR−1 mod n. Techniques such as Montgomery exponentiation, described in section 14.6.1 of the Handbook of Applied Cryptography, Menezes et al., CRC Press, 1997, utilize Montgomery multiplication to increase computational efficiency. A summary of Montgomery multiplication follows.
A computational engine performing Montgomery multiplication receives as an input the modulus n, precomputed values R=2r and μ, and the integers x and y on which Montgomery multiplication is to be performed. For Montgomery multiplication to operate correctly, it must be the case that gcd(n,R)=1. It is usual that 0≦x,y<n. The computational engine performs the following computations to obtain the value xyR−1 mod n:
1. A←0 (Notation: A=(adad−1 . . . a1a0)b where b=2w and d is the number of words of the modulus n—note that d=r/w).
2. For i=0 to d−1 do the following:                2.1 ui←(a0+xiy0)μ mod b        2.2 A←(A+xiy+uin)/b        
3. If A≧n then A←A−n
4. Return (A).
The value A returned is xyR−1 mod n.
As with Montgomery reduction, the final reduction (step 3) in Montgomery multiplication may be omitted if side channel attacks are a concern. In this case, the output of the Montgomery reduction is a value congruent to xyR−1 mod n modulo n.
Typically, calculations using Montgomery reduction are carried out on numbers in their Montgomery form. The Montgomery form of a number a is computed as â=aR mod n. Modular addition or subtraction (modulo n) of values in Montgomery form produces results in Montgomery form. Additionally, Montgomery multiplication of values in Montgomery form also produces values in Montgomery form, i.e., â{circumflex over (b)}=aR·bRR−1 mod n=abR mod n. Conveniently, conversion to Montgomery form may be carried out via the Montgomery multiplication â=aR2=aR mod n, and conversion from Montgomery form back to regular (non-Montgomery) or canonical form may be carried out by either the Montgomery reduction: âR−1 mod n=a mod n, or by the Montgomery multiplication: â1=aRR−1=a mod n.
A computing device in a cryptographic system will often have a computational engine for calculating the Montgomery product of two numbers. This computational engine is typically referred to as a Montgomery machine or Montgomery engine. The machine may be implemented in a hardware or software module, and is configured to operate on a set of parameters to produce a result. For example, the machine may perform the Montgomery multiplication on two inputs a and b and output the result ab. Such a Montgomery machine can therefore also be used to convert to and from Montgomery form and to perform Montgomery reduction. For converting to Montgomery form, the machine accepts a and R2 as inputs and computes the output â=aR2=aR mod n. Conversely, for converting back to canonical form, the machine accepts a and 1 as inputs and computes the output â1=a. To calculate the Montgomery reduction of â value a, the machine accepts a and 1 as inputs and computes a1=aR−1 mod n as the output.
The Montgomery machine is typically provided with the value of the modulus n and perhaps the value of the Montgomery radix R (or an equivalent value such as r). The machine then computes the value μ, which is utilized as a precomputed value in subsequent operations. Alternatively, the Montgomery radix R may instead be computed by the machine and/or the value μ may instead be provided to the Montgomery machine. The value R2 is then computed from the Montgomery radix R=2r and stored for use by the Montgomery machine to convert numbers into their Montgomery form. Note that R is a fixed point of Montgomery multiplication (i.e. RR=R) and therefore it is not possible to obtain R2 mod n by simply performing the Montgomery multiplication of R with itself. The computation of R2 mod n can instead be performed by utilizing a series of addition and multiplication/squaring operations. For example, one way to perform the computation is as follows: (1) start with the value R=2r−1; (2) add this value to itself: (2r−1+2r−1)mod n=2r mod n; (3) add the resulting value to itself: (2r mod n+2r mod n)mod n=2r+1 mod n; (4) square the resulting value using Montgomery multiplication: 2r+1 mod n2r+1 mod n=2r+2 mod n; and (5) continue squaring the resulting value via Montgomery multiplication until the value 2r+r/2 mod n2r+r/2 mod n=2r+r mod n=R2 mod n is obtained.
The computation of R2 mod n utilizing a series of addition and multiplication/squaring operations, such as those described above, is known in the art. Many variations are also known, including variations that modify the order in which the adding and multiplying/squaring is performed. For example, one variation of performing the computation R2 mod n is as follows: (1) calculate the two's complement of n: R−n; (2) add this value to itself to yield (R−n)+(R−n)=(2R)mod n; and (3) multiply (2R)mod n by itself r times using Montgomery multiplication to yield R2 mod n:(2R)mod n(2R)mod n . . . (2R)mod n==R2 mod n. 
Many variations for computing R2 mod n are known in the art, two of which are shown above. In all of these variations, a series of addition and multiplication and/or squaring operations are performed. Also, in all of these variations, if the final reduction step is omitted in the Montgomery multiplication operations, the value R2 computed may not be fully reduced (i.e. it may not be R2 mod n per se), but it will be congruent to R2 mod n modulo n.
Typically, Montgomery machines are limited only to performing operations with moduli of a fixed bit-length, or multiples of this length. Such machines are referred to as block Montgomery machines. The block-length of a block Montgomery machine is often 32, 64, 128 or 256 bits, with allowable bit-lengths for the moduli consisting of multiples of this block length. Such a structure is disadvantageous when implementing schemes that use moduli having bit lengths not equal to the fixed bit-length of the Montgomery machine (or a multiple thereof).