As the use of network systems grows, protection of network communications becomes more important. Protection of the integrity and secrecy of data becomes an issue.
The basic process of code transporting, and decoding a message includes taking the message (plaintext), modifying (encrypting) the plaintext into ciphertext, transmitting the ciphertext to a receiver, and de-modifying (decrypting) the ciphertext, to recover the original message.
In cryptosystems, an encryption key is used to encrypt the plaintext. The ciphertext is transmitted to a receiver, and the receiver decrypts the ciphertext, using a decryption key, back to the original plaintext. The encryption key and the decryption key are often referred to as key-pairs.
For example, public and private key-pairs can be functions of two or more large prime numbers. Each function, encryption and decryption, relies on the large prime numbers and each set is referred to as a key-pair. There are two keypairs (P, Q) for a complete system (encrypt then decrypt). To increase the security, the word-length of P and Q may be chosen to be equal, so that they can not be distinguished based on bit length, and then the product M is computed:M=P*Q.  (1)
An encryption key KE is randomly chosen such that KE and (P−1)(Q−1) are relatively prime. Accordingly, the decryption key KD can be computed using an extended Euclidean algorithm that satisfies:KD=KE−1 mod((P−1)(Q−1)).  (2)
The numbers KD and Mmay also be relatively prime. The numbers (KE, M) may be the encryption key (or public key) used for data encryption, and the numbers (KD, M) are the decryption key used for decryption. After the keys are generated, the original message is encrypted by performing the computation of:C=TKE mod M,  (3)
where T is the original message (plaintext) and C is the encrypted message (ciphertext). To decrypt the encrypted data, the following computation is performed:T′=CKD mod M,  (4)
where T′ is the decrypted message. T′ should be the same as the original message T. As can be seen, several modular multiplications are performed.
In some encryptsystems, a long word-length, generally more than 512 bits, is usually employed to meet security requirements. However, speed performance is limited by the long word-length, requiring increasing computational speeds. For speed of computation, fast exponential computation becomes increasingly important. There are several methods, such as H-algorithm, L-algorithm, etc., which can be used to accelerate the exponential computation. One such method is the Montgomery modular multiplication algorithm, which can be used as a kernel operation in high-performance exponent-computation algorithms. The Montgomery modular multiplication algorithm also improves the efficiency of encryption and decryption operations.
The Montgomery modular multiplication algorithm is provided to compute the resulting n-bit number:SN=A*B*R−1 mod M, (where the radix R=2n)  (5)
required in the modular exponential algorithm, where A, B and M are the multiplicand, multiplicator, and modular number, respectively, and each has n bits. An exemplary radix 2 Montgomery iterative modular multiplication algorithm is:
S0=0;
for (I=0; I<N; I++) {
qI=(SI+bIA) mod 2;
SI+1=(SI+bIA+qIM)/2; }
if (SN>=M) SN=SN−M;
where bIA(=PPI) is a partial product; qIM=(MMI) is a multiple of modulus M which makes one least significant bit (LSB) of SPPI (=I=PPI) into a zero(0) value; n is the bit length of modulus M; N=n/2; SI is the partial accumulated result of a previous cycle; SI+1 is the partial accumulated result of the current cycle with n bits; and SN is the final computation result. An exemplary radix-4 Montgomery iterative modular multiplication algorithm is:
S0=0;
for (I=0; I<N; I++){
qI=(((SI+bIA) mod 4)*M′) mod 4;
SI+1=(SI+bIA+qIM)/4;}
if (SN>=M) SN=SN−M;
where N=n/2; (−M*M′)mod 4=1; and −M is the 2's-complement of M. Both the radix-2 and the radix-4 process are iterative processes producing iterative data; data whose value changes with iterations within the loop of I=0; I<N; I++. The modular operation speed affects the system performance. Therefore, if the bit length is very long, the system performance is degraded. To compute MMI(=qIM), first the PPI(=bIA) is computed and then the computed PPI and SI are added. Therefore, power consumption is increased because the accumulator executes the logical computation twice.
A hardware implementation of a conventional Montgomery modular multiplication algorithm is shown in FIG. 1, which utilizes two carry propagate adders 91 and 92 (hereinafter abbreviated as CPAs). The first CPA 91 is provided for a multiplication operation, and receives a previous computation result and a result of ai ANDed with B outputted from an AND logic 93 that receives the aj and B. The second CPA 92 is provided for a modular operation, and receives the output of the result of the first CPA 91 and a result of qi ANDed with N from an AND logic 94. The output of the CPA 92 is shifted to right by one bit with a shifter 95, so as to divide the output result by 2, thereby generating the computation result for one iteration.
To complete a 512-bit Montgomery modular multiplication, there are 512 iterations, which can be temporally expensive. As a result, the speed of a 512-bit RSA en/decryption is still slower than the current network transmission bandwidth speed.
The Montgomery modular multiplication may be time-consuming and affects the operation in digital appliances including cryptographic computation devices. To manufacture high performance digital appliances, it is often necessary to improve the speed of the modular operation.
In addition to speed, an additional concern is power consumption. A low power consumption is desirable, for example in smart card and mobile products, low power consumption becomes more important. Smart card and mobile products use cryptographic computation devices to secure data (contents) and improving the efficiencies of the devices can improve the power consumption characteristics of these devices. Additionally computational devices consume a lot of power, and the majority of the power is consumed by modular multiplication. In particular, as the bit length increases, the more power is required in the modular operation.