1. Field of the Invention
The present invention relates to an arithmetic device containing multiplier-accumulator circuitry. More particularly, the present invention relates to an arithmetic device that calculates effective lower bits of a parameter ND for Montgomery modular multiplication, where ND satisfies a mathematical condition R×R−1−N×ND=1 for an integer N and a radix R that is coprime to and greater than N.
2. Description of the Related Art
Recent years have seen a rapid growth of online trade market, also known as electronic commerce (e-commerce), where commercial transactions involving money transfers take place over a network. People are exchanging personal information such as their credit card numbers over a network more frequently than ever before. Such important personal information has to be protected from eavesdropping and tampering attacks by a malicious third party. The use of cryptographic techniques is thus mandatory to ensure the security of information transfer.
Public-key systems, one of the modern cryptographic algorithms, use a pair of cryptographic keys called public and private keys. The sender encrypts his/her message with the receiver's public key, and the receiver decrypts the received message with his/her secret private key. Suppose, for example, that a person is purchasing a product at an online store. The online store sends their public key to the purchaser, allowing him/her to send his/her credit card number and other information in encrypted form. The store can decode the received information by using their private key. The advantage of public-key systems is that the public key is made available to the public. That is; public-key cryptosystems permit one to achieve secure communication with anyone who have publicized their encryption keys.
One example of a public-key algorithm is RSA, named after its three creators: Ron Rivest, Adi Shamir, and Leonard Adleman. The RSA cryptosystem uses a modular multiplication process to ensure the secrecy of ciphertext, relying on the difficulty of prime factorization of a large integer. That is, when a certain number x and an integer n are given, it is relatively easy for a computer to calculate a power of x modulo n (remainder of division by n). But, because of the difficulty of prime factorization, it is very hard to accomplish the reverse process when n is large, meaning that the original number x cannot be reproduced easily. RSA is grounded on this nature of modular arithmetic.
RSA, however, requires a larger amount of computation for modular multiplication than symmetric cryptosystems such as the Data Encryption Standard (DES), and this fact leads to demands for a faster algorithm. Montgomery modular multiplication method is one of the solutions for reducing computational burden. When a radix R coprime to an integer N is selected such that R>N, the Montgomery algorithm computes T×R−1 mod N (i.e., remainder of division of T×R−1 by integer N) from an input value T satisfying O≦T≦R×N, where modulus N is represented as an irreducible polynomial of degree N. This algorithm is suitable for “modulo N” computation particularly when the integer N is very large. For details, see: P. L. Montgomery, “Modular Multiplication without Trial Division,” Mathematics of Computation, Vol. 44, No. 170, pp. 519-521, 1985.
To implement the Montgomery algorithm, it is necessary to obtain a parameter ND that satisfies the condition of R×R−1−N×ND=1, where R−1 denotes the multiplicative inverse element of radix R, modulo N. It is known that ND can be obtained by applying the Euclidean algorithm to radix R and integer N, where the process of repeating divisions yields all digits of an ND value. While ND is a multiple-precision data word, we know that Montgomery modular multiplication requires not all bits of ND as its input parameter, but only a limited number of lower bits of ND. In this description, we use the term “effective lower bits” to refer to this limited range of lower bits of ND that are relevant to Montgomery modular multiplication.
Some algorithms for calculating ND take advantage of the fact that the effective lower bits of N×ND are all ones (e.g., 0xffffffff when lower 32 bits are used in modular multiplication). See, for example, . K. Koç, “High-Speed RSA Implementation,” Technical Report TR 201, RSA Laboratories, Version 2.0, November, 1994, pp. 48-49. The following section will describe how this type of algorithm works.
FIG. 7 shows a conventional algorithm for calculating ND. The algorithm is represented in the form of a program code, where s (lowercase letter) denotes the number of lower bits of ND that are relevant to the calculation, and the number or symbol in square brackets following each variable S, N, or ND represents a particular bit position. The bit position is counted in the direction from the least significant bit (LSB) to the most significant bit (MSB). For example, S[0] means the LSB of variable S, and ND[i] means the (i+1)th bit of ND, counted from its LSB.
The algorithm starts with initialization of variable S and then repeats the following process for i=0 to s−1. Specifically, ND[i] will be set to one (ND[i]:=1) and N is added to S, if S[0] is zero. (Note that the addition takes effect only on the effective lower bits, and this holds true in the rest of this section.) The resulting sum is then shifted to the right by one bit, as indicated by the operator “>>1” in FIG. 7, before it is put in place of S. If S[0] is one, ND[i] is set to zero, and S is shifted right by one bit.
FIG. 8 shows the structure of a conventional arithmetic circuit for calculating ND. As an implementation of the algorithm described in FIG. 7, the illustrated arithmetic circuit 800 is composed of the following elements: a multiplexer 801, a two-input adder 802, a sum register 803, an inverter 804, and an ND register 805. The multiplexer 801 selects either N or zero (i.e., determines whether to add N to S), depending on the current value of S. The two-input adder 802 performs addition of the selected value and variable S. The sum register 803 stores the resulting sum S, while the ND register 805 stores ND.
Suppose that the sum register 803 has been initialized to zero. The multiplexer 801 determines whether to add N to S, depending on S[0], the current LSB of variable S. When S[0]==0, meaning that addition should take place, the multiplexer 801 selects and supplies N to the two-input adder 802. Otherwise, it selects and supplies zero to the two-input adder 802. The two-input adder 802 adds the selected value to the variable S read out of the sum register 803, shifts the result to the right by one bit, and stores it back into the sum register 803. The inverter 804 supplies the ND register 805 with an inverted version of the current S[0] as the bit of ND. The ND register 805 is a shift register designed to shift the data rightward each time a new bit arrives at its MSB end. The ND register 805 will thus yield a complete ND value when the predetermined number(s) of iterations are completed.
Carry delay time of the two-input adder 802 can be a critical factor in performance when ND has a large number of effective lower bits. For high-speed applications, carry-save adders (CSA) are thus preferred. Specifically, the carry output is saved in a carry register, aside from the sum output. A carry-save adder sums up three input values (N, S, carry) to yield ND.