The demand for secure information continues to drive improvements relating to cryptology. Modular exponentiation, and more particularly, the Montgomery algorithms, remains fundamental to the encryption and decryption of confidential, authenticated data used in Internet and electronic commerce. Montgomery modulation generally exploits properties and interrelations of very large numbers to avoid working with the numbers, themselves. Accordingly, dedicated programming and hardware for implementing Montgomery processes have been developed to achieve the repeated multiplications required for modular exponentiation in a faster and more efficient manner.
Calculation of an error correction parameter associated with Montgomery modulation is vital to the performance of exponentiation hardware and software. The error correction parameter is a constant that equals 22n mod N, where n is equivalent to the number of bits in the modulus, N, rounded up to the nearest multiple of the size of the multiplier core used in Montgomery modulation. As such, the parameter equals the remainder of a normal division operation where a bit string with a most significant bit of one, followed by 2n least significant zeros, is divided by the modulus.
In certain implementations, the error calculation parameter is pre-calculated using software configured to run the above described modulo operation on a computer with adequate processing. However, the size of the modulus, N, which can be on the order of thousands of bits in length, can burden even large processors. Alternatively, hardware circuit implementations that use dedicated gates to avoid the long delays of software are subject to their own timing issues. For instance, conventional hardware circuits must perform 22n processing loops during a modulo operation to arrive at the error correction parameter. Each iteration of the loop consists of a shift/compare operation, where a (n+1) bit accumulator is compared to the modulus, with the modulus subtracted therefrom if it is greater, or the accumulator multiplied by two (e.g. by shifting the contents of the accumulator one bit to the left) if it is less.
Moreover, to reduce the size of the subtraction circuitry, often each subtraction operation is performed using a series of partial subtraction operations that operate on a few bytes at a time (e.g., performing a 1024 bit subtraction using 64-bit subtraction circuitry that performs 16 partial subtraction operations). The tradeoff for the reduced size of subtraction circuitry is that each loop iteration requires multiple clock cycles to handle each subtraction operation.
For instance, a first iteration of a conventional loop may consist of initially setting a value in a working register to one, left shifting by one position, and attempting to subtract the modulus from the left-shifted result to determine if the value of the working register is larger than the modulus. Where so, the subtracted value in the working register is retained. Otherwise, the subtracted value of the working register is discarded, and the pre-subtracted value is again shifted and compared to the modulus in a subsequent iteration of the loop. A total of 2n iterations are performed in this manner, with the resulting value in the working register being the desired error correction parameter. Given that the value of 2n can be in the thousands, it will be appreciated that such repetitious iterations represent some of the most time intensive operations of a Montgomery application.
Consequently and despite the advances in implementing modular multiplications, a continuing need exists for further improvements in the field to reduce the overhead associated with performing modular multiplication operations.