This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 11-310619, filed Nov. 1, 1999, the entire contents of which are incorporated herein by reference.
The present invention relates to a modular arithmetic processing apparatus and method for performing an arithmetic operation of a large integer at a high speed by parallel processing on the basis of a residue number system.
As a method of executing an efficient arithmetic operation of a large integer, a modular arithmetic or residue number system is known. In the residue number system, a set of relatively small integers {a1, a2, . . . , an}, which are relatively prime to each other, is prepared, and a large integer as an expression target is expressed by residues obtained by dividing the integer by these integers. This set of integers will be referred to as the base of the residue number system hereinafter. The number n of elements will be referred to as a base size.
For example, when a base {a1, a2, . . . , an} is given, an integer n is expressed by n residues {x1, x2, . . . , xn} obtained by dividing the integer x by the base ai (i=1, 2, . . . , n). If the number x is a positive integer smaller than a product A (=a1a2 . . . an) of the base elements, the number x can be uniquely expressed modulo the product A of the base elements. In other words, the number x and its residue number system expression {x1, x2, . . . , xn} are in a one-to-one correspondence.
In such residue number system expression, to calculate the product of two integers x and y, the product is obtained in units of elements, and then, residues are obtained by dividing the integers by a corresponding base ai. In other words, generally, a product modulo the product A of base elements is obtained by calculating products modulo the corresponding base ai in units of elements. This also applies to addition and subtraction. For elements xi and yi corresponding to the base ai, an addition or subtraction modulo ai is executed.
In the arithmetic operation using such a residue number system, a multiplication, addition, or subtraction is executed by arithmetic operation modulo bases independently corresponding to the elements. For example, when values within the word length of a computer are employed as a base, the arithmetic operation of a very large integer can be realized by repeating a single-precision arithmetic operation.
Since the single-precision arithmetic operations can be independently executed in units of bases, preparing a plurality of calculators allows parallel processing. For example, when the base size is n, n multipliers with a residue function are prepared and parallelly operated whereby multiplication modulo the product A of base elements can be completed within the same time as that for one multiplication with single-precision residue.
A current computer normally uses binary expression. Calculation of a large integer based on binary expression takes a processing time proportional to the total number of digits (or bit length) of the large integer because carry propagates from the LSB (Least Significant Bit) to the MSB (Most Significant Bit). This is disadvantageous in processing speed as compared to parallel processing using a residue number system.
On the other hand, the residue number system is known for a long time as a scheme of efficiently executing a multiplication, addition, or subtraction of a large integer relative to radix representation represented by binary expression because no carry between words occurs.
However, for a division or comparison between two numbers, a more efficient means than the radix representation has been unknown. For this reason, how to apply the residue number system in detail has not been known until 80s, although it is supposed that the residue number system is suitable to an application for executing arithmetic operation of a large integer at a high speed, like public key cryptography system.
Posch et al. have proposed a scheme of executing arithmetic operation of RSA cryptography of a public key cryptography system using the residue number system in xe2x80x9cModulo Reduction in Residue Number Systemsxe2x80x9d (IEEE Transaction on Parallel and Distributed Systems, Vol. 6, No. 5, May 1995, pp. 449-454) and xe2x80x9cRNS-Modulo Reduction Upon a Restricted Base Value Set and its Applicability to RSA Cryptographyxe2x80x9d (Computer and Security, Vol. 17, pp. 637-650, 1998).
In addition, Kornerup et al. have proposed a similar high-speed arithmetic scheme in xe2x80x9cAn RNS Montgomery Modular Multiplication Algorithmxe2x80x9d (13th IEEE Symposium on Computer Arithmetic (Proceedings of ARITH13), IEEE Computer Society, pp. 234-239), and Paillier has proposed a similar scheme in xe2x80x9cLow-Cost Double-Size Modular Exponentiation or How to Stretch Your Cryptoprocessorxe2x80x9d (Springer-Verlag, Lecture Notes in Computer Science No. 1560 Public Key Cryptography (PKC ""99), pp. 223-234).
The main reason why the residue number system is used for RSA cryptography is that this cryptography is constructed by repeating a residue multiplication of a very large integer with 200 decimal digits or more, and high-speed processing can be realized using the above-described characteristic of the residue number system, which allows high-speed multiplication and addition/subtraction.
A common point of the schemes of Posch et al., Kornerup et al., and Paillier is that the Montgomery algorithm is combined with the residue number system to avoid a division disadvantageous for the residue number system. As another common point of the three schemes, base conversion or base extension is executed in the course of processing to obtain a value that expresses an integer, which is expressed by a residue using a certain base, with another base. In any scheme, whether base conversion or base extension can be efficiency executed affects the efficiency of the entire processing.
Two terms xe2x80x9cbase conversionxe2x80x9d and xe2x80x9cbase extensionxe2x80x9d are used here. Base conversion means that a value expressed by a given base is re-expressed using another base prime to the given base. Base extension means that a value expressed by a base with the size n is expressed by a base with a size (n+1), i.e., a base obtained by adding, to the original base, an integer that is prime to the original base, and the (n+1)th element at that time is obtained. With the base extension scheme, base conversion can be constituted by repeating the base extension n times. In realizing RSA cryptography using the residue number system, a scheme and apparatus for efficiently executing base conversion (or base extension) are necessary.
However, the above-described three base conversion schemes and the base conversion schemes that have been conventionally proposed are inefficient in some points, as will be described below.
In the scheme proposed by Posch et al., the base conversion scheme mentioned in the arithmetic operation of RSA cryptography may generate an error in the value after conversion when the value before conversion is smaller than a predetermined value. To avoid this, Posch et al. has proposed a procedure in which an appropriate offset is added to the input of base conversion processing to convert the input into a value that causes no error in base conversion processing, the conversion result is base-converted, and the effect of offset is removed from the obtained base conversion result. However, such pre-processing and post-processing for offset increase the entire arithmetic amount, resulting in low efficiency.
Additionally, since the scheme of Posch et al. considerably limits the size of the RSA cryptography key calculable by a given base and requires a multiplier to calculate a correction term necessary for base conversion, it is also disadvantageous in the area in circuitry and processing delay.
FIG. 5 is a block diagram showing the schematic arrangement of a modular arithmetic circuit used for the RSA cryptography arithmetic operation using the scheme of Posch et al.
A product-sum circuit 501 with a modular arithmetic function, RAM 521, and ROM 531 constitute one unit. N units having the same arrangement are parallelly arrayed. In this case, the base size is n, and each unit executes an arithmetic operation corresponding to a specific base. For example, each unit corresponds to each of n base elements of a base A and each of n base elements of a base B. For example, the product-sum circuit 501 executes an arithmetic operation corresponding to bases a1 and b1. Each of the n units is designed to execute an arithmetic operation of r bits. These units are connected to each other through an r-bit bus.
FIG. 6 shows the internal structure of each of the product-sum circuits 501 to 50n. A structure related to the unit represented by the product-sum circuit 501 will be described here for the descriptive convenience. Inputs are r-bit data represented by a and b and an r-bit data input from the ROM 531, which is input from the right side in FIG. 6. Referring to FIG. 6, the data a is the input from the RAM 521, and the data b is the input from the ROM 531. The data a and b are multiplied by a multiplier 601, and the result is supplied to an adder 602 on the output side. The adder, 602 receives and adds the multiplication result and a feedback value from a register 604. The result from the adder 602 is supplied to a modular arithmetic section 603 and converted into a residue by division by the value set in a register 605. The value of the register 605 is denoted by mi which represents the base a1 or b1. N data sets equal to the base size are supplied to the inputs a and b. After all the n data are calculated, the calculation result is complete in the register 604. It is supplied to the RAM 521 through the r-bit bus.
Referring back to FIG. 5, the modular arithmetic circuit includes a correction term calculation unit 510 for correcting the calculation result in base conversion and a ROM 530 which is externally attached to the correction term calculation unit 510 to supply at least an n-word parameter to the correction term calculation unit 510.
The correction term calculation unit 510 proposed by Posch et al. is implemented by a product-sum circuit as shown in FIG. 7. In the circuit shown in FIG. 7, input r-bit data and data input from the ROM 530 are multiplied by a multiplier 701 and then cumulatively added by an adder 702. The sum is stored in a register 703. The value is fed back after the correction term is completely calculated.
Note that the circuit scale of the correction term calculation unit 510 is as large as that of the product-sum circuit with a modular arithmetic function shown in FIG. 6. In addition, the correction term calculated here has a size of about (r+log2n) bits. Referring to FIG. 6, the transmission bus width for transmitting the correction term to the product-sum circuits 501 to 50n is not r bits but (r+log2n) bits, causing an increase in circuit area. Of these bits, r bits can be shared as the bus for the RAM to the correction term calculation unit. Even in this case, however, an extra area is required for log2n bits for feedback.
Furthermore, the product-sum circuits 501 to 50n must execute the modular arithmetic operation at least once to reflect the correction term received from the correction term calculation unit 510 to the previous calculation result. The processing time may be saved if the correction term can be sequentially fed back to the product-sum circuits during other processes. However, in the implementation of Posch et al., the value cannot be fed back until the correction term is completely calculated. No means for solving these detailed problems has been proposed.
Another prior-art scheme of Kornerup et al. uses a scheme proposed in Shenoy and Kumaresan xe2x80x9cFast Base Extension Using a Redundant Modulus in RNSxe2x80x9d (IEEE Transaction on Computers, Vol. 38, No. 2, February 1989, pp. 292-297) to calculate the correction term. In this case, the size of correction term is n, i.e., much smaller than that of the scheme of Posch et al. However, this scheme also requires a multiplication for correction term calculation, and a correction term arithmetic procedure efficient in circuit scale and processing delay has been demanded.
Still another prior-art scheme proposed by Paillier has a limited application range because of conditions that an arbitrary base cannot be selected, and conversion from a base to radix representation or conversion from radix representation to residue number system representation can be very efficiently executed. As an applicable example, only a case wherein two bases each having the base size n of 2 are used is described in detail in this paper, and other practical examples are unknown. When the base size n is as small as 2, each base element conversely becomes large, and this makes it difficult to increase the processing speed as compared to a case wherein the base size n can be set large, and each base element can be set small.
As described above, three schemes are known, which propose use of the residue number system aiming at high-speed processing of RSA cryptography are known. Although these schemes can improve the processing efficiency relative to conventionally proposed RSA cryptography arithmetic schemes, they are poor in base conversion processing efficiency as the most important part of the processing step or can use only a limited base size.
The present invention has been made in consideration of the above situation, and has as its object to provide a new base conversion scheme better than the conventionally proposed base conversion schemes in some or all of the following points.
(a) The value of a correction term is relatively small and can be sequentially processed.
(b) The value after conversion matches the value expressed before conversion and has no error.
(c) Even if an error occurs, it can easily be controlled by pre- and post-processing or limitation on the input size.
(d) In an application to RSA cryptography, limitations on the key size are small.
(e) No multiplication is required to calculate the correction term, and the processing efficiency is high.
(f) The manner bases are set is not so constrained, and the versatility is high.
It is another object of the present invention to implement a high-speed modular arithmetic apparatus and method used for RSA cryptographic processing by combining the base conversion scheme with the Montgomery algorithm.
According to the present invention, there is provided a modular arithmetic apparatus comprising a plurality of product-sum circuits having a modular arithmetic function and parallelly arranged, and a correction term calculation unit for calculating a correction term to be used for modular arithmetic operation in the product-sum circuits, wherein the correction term calculation unit sequentially calculates the correction term in units of bits, and each of the product-sum circuits sequentially reflects the correction term calculated by the correction term calculation unit and performs base conversion or base extension. The product-sum circuit may be characterized by performing a Montgomery multiplication.
According to the present invention, there is also provided a modular arithmetic processing apparatus comprising a plurality of product-sum circuits parallelly arranged, and a correction term calculation unit for calculating a correction term to be used for modular arithmetic operation in the product-sum circuit, wherein the correction term calculation unit sequentially calculates the correction term in units of bits, and each of the product-sum circuits sequentially reflects the correction term calculated by the correction term calculation unit and converts a residue number system representation into a radix representation.
The correction term calculation unit may have a division circuit, and a base of a residue number system processed by the product-sum circuit may be a power of 2 or be approximated to a power of 2. The apparatus may further comprise a bit selection section for selecting an input bit to the correction term calculation unit. The apparatus may further comprise an I/O section for inputting/outputting data to/from an external unit.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.