1. Field of the Invention
The present invention relates to a system and method for performing modular multiplication, and in particular the modular multiplication a*b*2xe2x88x92N modulo n, where a, b and n are N-bit integers with n odd. It will be appreciated that the integers a, b and n may not necessarily require all N-bits to specify the integer, but it is clear that any such numbers can always be specified by N-bits by adding an appropriate number of zeros as the most significant bits of the N-bit number.
2. Description of the Prior Art
There are a number of situations where it is desirable to perform the modular multiplication a*b*2xe2x88x92N modulo n, this particular type of modular multiplication often being referred to as Montgomery multiplication.
One particular implementation which makes use of Montgomery multiplication is RSA encryption/decryption as used in smart cards or other secure devices. RSA requires the rapid modular multiplication of large integers, for example 512 or 1024 bits in length, in order to carry out modular exponentiations. The Montgomery multiplication technique provides an efficient way of producing the modular result by applying corrections to partial products as they are generated and accumulated.
In typical implementations of Montgomery multiplication, the multiplication has to be handled in word lengths of w bits, where w is typically 16 or 32. The Montgomery algorithm requires a (N+w+1) bit storage register xe2x80x9ccxe2x80x9d for holding and accumulating partial products as they are generated, and a temporary w-bit number m, which is a Montgomery correction factor.
In the following discussion of the Montgomery algorithm, the notation b[j] refers to the j-th w-bit word of b, i.e. b[0] is the least significant w-bit word of b, and b[sxe2x88x921] is the most significant w-bit word of b, where s=N/w, i.e. the number of words in an N-bit number.
The Montgomery correction factor m makes use of a constant n0p that is derived from the least significant w-bit word of the modulus n[0]. This constant n0p obeys the following relation:
n[0]*n0p=xe2x88x921 mod 2w
Given the above, the standard Montgomery algorithm can be represented by the following pseudo code:
c:=0
for j=0 to sxe2x88x921 do
c:=c+a*b[j] // calculate and accumulate the partial product
m:=(c[0]*n0p) mod 2w // calculate the montgomery correction factor
c:=c+n*m // apply the montgomery reduction
c:=c greater than  greater than w // drop the bottom word of c which is always zero
endfor
result:=c // the final (N+1) bit result is in c
In practice, the a*b[j] and n*m multiplications in the above pseudo code are broken down into fixed size multiply-accumulate (MAC) operations, for example 32xc3x9716 bit MAC operations if w=16. This size of MAC is a good balance between maximising data throughput on the one hand and minimising die area and easing timing requirements on the other. However, it will be appreciated that for the Montgomery multiplication to be performed in an efficient manner, the multiplication block must still have access to sufficient fast storage to store the input N-bit integers a, b and n, and the partial products generated, for which N+w+1 bits are required. In a hardware implementation of the multiplication block, the fast storage would typically take the form of hardware registers, whilst in a software implementation the fast storage might take the form of a cache or SRAM.
An example of such a prior art approach is the stand alone RSA coprocessor from SIDSA referred to as the sRSAC2048A, which uses a large bank of 256 32-bit words of onboard memory to hold a maximum of four 2048-bit values.
It will be appreciated that for large N-bit integers, there is hence a requirement for a significant amount of fast storage, and the current trend is for these N-bit numbers to increase in size, thereby further increasing the storage requirements of a system for performing Montgomery multiplication.
Viewed from a first aspect the present invention provides a system for performing a modular multiplication a*b*2xe2x88x92N modulo n, where a, b and n are N-bit integers, comprising: a multiplier for multiplying a Y-bit number by a Z-bit number; partitioning logic for partitioning the integer a into a plurality of first sections, each first section being of a size which is a multiple of Y, and for partitioning the integer b into a plurality of second sections, each second section being of a size which is a multiple of Z; a multiplication unit for applying operations to control the multiplier to perform a sequence of multiplications to multiply one of said first sections by one of said second sections in order to generate a number of output operands for use in subsequent operations performed by the multiplication unit; a controller for sequentially inputting one of said first sections and one of said second sections into the multiplication unit along with predetermined ones of said output operands from preceding operations performed by the multiplication unit, until each first section has been multiplied by each second section.
In accordance with the present invention, a multiplier is provided for multiplying a Y-bit number by a Z-bit number, and hence for example may be arranged to perform a 32-bit by 16-bit multiplication. Then, in accordance with the present invention, the N-bit integers a and b are partitioned into a plurality of first and second sections, respectively, where each first section is of a size which is a multiple of Y, and each second section is of a size which is a multiple of Z. In preferred embodiments, the partitioning logic used to perform this partitioning is implemented in software.
Further, a multiplication unit is provided which is able to control the multiplier to perform a sequence of multiplications in order to multiply one of the first sections by one of the second sections, with a number of output operands then being generated for use in subsequent operations performed by the multiplication unit. A controller is provided to sequentially input one of the first sections and one of the second sections into the multiplication unit along with predetermined ones of the output operands from preceding operations performed by the multiplication unit, so as to enable each first section to be multiplied by each second section.
By this approach, a multiplication unit can be provided which is of a fixed size, irrespective of the size of the input integers, a b and n. The integers can then be partitioned into a plurality of sections, each section being of a size which can be handled by the multiplication unit, with a controller then being provided to manage the sequence in which the sections are operated upon by the multiplication unit. This approach alleviates the requirements for increasingly larger fast storage, the size of the fast storage being dependent not on the ultimate size of the N-bit integers, but rather on the predetermined size of the sections into which those integers are partitioned.
For example, in one embodiment, the sections into which the N-bit integers are partitioned may be 256 bits in length and accordingly the fast storage, for example hardware registers if the multiplication unit is implemented in hardware, need only to be able to store 256 bit sections of the integers a, b and n, rather than having to store the entire N-bit integers. The entire Montgomery multiplication on the N-bit integers is then performed by calling the multiplication unit several times in a chained fashion under the control of the controller. Hence, if the multiplicand a and multiplier b are 768-bit numbers, and the multiplication unit is arranged to handle 256-bit sections of these numbers, then the multiplication unit can be used to break down the multiplication into a 3xc3x973 array of blocks which are 256xc3x97256 bits in size. Similarly, if the multiplicand a and multiplier b are 1024-bit numbers, then the multiplication unit can be used to break down the multiply into a 4xc3x974 array of blocks which are 256xc3x97256 bits in size.
Hence, it will be appreciated that not only does the present invention alleviate the increasing requirements for large fast storage, but also enables a multiplication unit of a fixed size to be used irrespective of the size of the N-bit integers. Hence, as the size of these N-bit integers increases, the same multiplication unit can be used, with the controller providing appropriate control to manage the sequential operation of the multiplication unit to perform the required modular multiplication.
It will be appreciated that the first and second sections can be chosen to be any appropriate size suitable for manipulation by the multiplication unit. However, preferably, both the first and second sections are of a size K, where N is a multiple of K.
In preferred embodiments, the output operands include a row accumulator value, such that if a first section V other than the first section representing the least significant bits of integer a is multiplied by a second section W other than the second section representing the most significant bits of integer b, then the row accumulator value generated is output for use in a subsequent operation performed by the multiplication unit to multiply a first section Vxe2x88x921 by a second section W+1.
Further, in preferred embodiments, the output operands include a column carry value, such that if a first section V other than the first section representing the most significant bits of integer a is multiplied by a second section W, then the column carry value generated is output for use in a subsequent operation performed by the multiplication unit to multiply a first section V+1 by the second section W.
Preferably, the partitioning logic is further arranged to partition the integer n into a plurality of third sections, each third section being of a size which is a multiple of Y, and the multiplication unit is arranged to control the multiplier to perform a sequence of multiplications to multiply one of said third sections by a correction value.
Typically, the first sections and third sections are of the same size. In this embodiment, the correction value is calculated to be the same size as the second section.
In preferred embodiments, when the multiplication unit is causing a multiplication to be performed involving the first section representing the least significant bits of the integer a, the multiplication unit is arranged to calculate the correction value to have a value such that the row accumulator value output by the multiplication unit has a value of zero.
Then, preferably, for a particular second section, the same correction value is used when the multiplication unit is causing that second section to be multiplied by any of the first sections.
The multiplication unit may be embodied in hardware, or alternatively may be embodied by software used to control the multiplier. However, in preferred embodiments, the multiplication unit is implemented in hardware, and comprises the multiplier and a number of registers for storing the first and second sections provided by the controller and the output operands generated during the operations performed by the multiplication unit.
Viewed from a second aspect, the present invention provides a method of performing a modular multiplication a*b*2xe2x88x92N modulo n, where a, b and n are N-bit integers, comprising the steps of: (a) providing a multiplier for multiplying a Y-bit number by a Z-bit number; (b) partitioning the integer a into a plurality of first sections, each first section being of a size which is a multiple of Y; (c) partitioning the integer b into a plurality of second sections, each second section being of a size which is a multiple of Z; (d) employing a multiplication unit to apply operations to control the multiplier to perform a sequence of multiplications to multiply one of said first sections by one of said second sections in order to generate a number of output operands for use in subsequent operations performed by the multiplication unit; (e) sequentially inputting one of said first sections and one of said second sections into the multiplication unit along with predetermined ones of said output operands from preceding operations performed by the multiplication unit, and repeating the step (d), until each first section has been multiplied by each second section.
Viewed from a third aspect, the present invention provides a computer program, or a computer program product, for operating a computer system to perform a modular multiplication a*b*2xe2x88x92N modulo n, where a, b and n are N-bit integers, the computer system having a multiplier for multiplying a Y-bit number by a Z-bit number; the computer program comprising: a partitioning routine configured in operation to partition the integer a into a plurality of first sections, each first section being of a size which is a multiple of Y, and to partition the integer b into a plurality of second sections, each second section being of a size which is a multiple of Z; a controller for managing the operation of a multiplication unit arranged to apply operations to control the multiplier to perform a sequence of multiplications to multiply one of said first sections by one of said second sections in order to generate a number of output operands for use in subsequent operations performed by the multiplication unit; the controller being configured in operation to sequentially input one of said first sections and one of said second sections into the multiplication unit along with predetermined ones of said output operands from preceding operations performed by the multiplication unit, until each first section has been multiplied by each second section.
As mentioned previously, the multiplication unit may be embodied in either hardware or software. However, if embodied in software, then in preferred embodiments the computer program further comprises the multiplication unit.