1. Field of the Invention
The present invention relates to a multiplication module, a multiplicative inverse arithmetic circuit, and a method and an apparatus for controlling the multiplicative inverse arithmetic circuit. More particularly, the present invention pertains to a multiplication module that can perform multiplicative inverse arithmetic using a Galois extension field GF(2m)(m is an arbitrary natural number) by employing a small circuit having a low latency and a multiplicative inverse arithmetic circuit therefor, a method and an apparatus for controlling the multiplicative inverse arithmetic circuit, and a cryptographic apparatus and an error correction decoder therefor.
2. Background Art
First, the evaluation points for a reciprocal arithmetic algorithm for employing hardware, including the present invention, are as follows:                (1) the number of multipliers        (2) the number of registers        (3) the latency (clock count*clock frequency in the case of a sequential circuit). This is extremely dependent on the number of multiplication processes performed.        (4) the maximum operating frequency for a sequential circuit. When an arithmetic operation can be performed with the same clock count, naturally, a circuit having a higher maximum operating frequency is better. But when the maximum operating frequency is the same, a circuit that requires a smaller clock count for computation is better.        
Relative to the above points, a comparison of the difference between conventional methods and the method of the present invention will be explained later, after an overview of the conventional methods has been given.
Method 1: Fermat's Little Theorem
As is described in reference documents [1] and [4], a multiplicative inverse element can be obtained by using the following formula:x−1=x2m−2=x21x22 . . . x2m−1.  [Expression 1]
When this formula is employed, m−2 multiplications are required.
To employ this formula to perform calculations using a sequential circuit, a frequently used algorithm, based on the calculation process shown in FIG. 1, is one according to which one multiplier and one square circuit are employed to calculate i powers of (x2) for (m−2) loops. The latency (cycle count) for the calculations is (m−2).
To calculate the formula using a combinational circuit, the tree structure shown in FIG. 2 is prepared to provide for a multiplier the latencyM{[log2(m−2)+1]}  [Expression 2](Generally, since the latency of the power arithmetic is extremely small, it is ignored).Method 2: An Algorithm by Itoh and Tsujii and a Similar Method
Of all the conventional algorithms, an algorithm (by Itoh and Tsujii) shown in reference document [2] requires the smallest minimum number of multiplications. An example calculation process, in which m=16, is shown in FIG. 3.
For another algorithm, which Itoh, et. al. proposed in reference document [3] before referring to the above algorithm, power number 2m−2 is recursively divided using a relationship such as2k−1=2(2k/2−1)(2k/2+1),  [Expression 3]and when actually used for a calculation, the multiplication and the power arithmetic are performed from the bottom up, in the reverse order. According to either algorithm, the number of cycles for a the sequential circuit is expressed by[log2(m−1)+Hw(m−1)−1,  [Expression 4]where Hw(x) denotes Humming Weight of the binary representation of x.
For the combinational circuit, the latency M of the multiplier isM{[log2(m−1)]+Hw(m−1)−1}  [Expression 5](the latency for the power arithmetic is extremely small and is ignored).
Unlike method 1, a problem with both of the algorithms is that correct results can not be obtained unless all of the multiplications are sequentially performed.
Method 3: Method Using Multiplication and Multiplicative Inverse Arithmetic Combination for Subfield
According to a method disclosed in reference documents [2] and [4], when m=kq (m is a composite number), the multiplicative inverse arithmetic used for GF(2m) results in the multiplication of GF(2m) and the multiplicative inverse arithmetic for GF(2k) (or GF(2q)). Using this method, when an irreducible polynomial and a representation basis were appropriately selected, in one case there was a considerable reduction in circuit size and an increase in circuit speed.
The use of this method, however, is limited. For example, this method can not be used if m is a prime number, and depending on the irreducible polynomial of a target field GF(2m), a reduction in circuit size and an increase in circuit speed can not be obtained.
Method 4: Euclidean Algorithm
Disclosed in reference document [5] is a method for calculating a multiplicative inverse using the Euclidean algorithm over polynomials. This method employs a property whereby, when an input polynomial (target polynomial for obtaining a multiplicative inverse) is defined as A and the irreducible polynomial is defined as F, values B and M, which satisfy BA+FM=1, are calculated using the Euclidean algorithm, and B is the multiplicative inverse of A. One problem encountered with this method is that the latency is generally 0(m).
Reference Documents:
    [1] S. B. Wicker and V. K. Bhargava (eds.), Reed Solomon Codes and Their Applications, IEEE Press, 1994.    [2] T. Itoh and S. Tsujii, “A Fast Algorithm For Computing Multiplicative Inverses In GF(2m) Using Normal Bases,” Information and Computation, Vol. 78, No. 3, pp. 171-177, 1988.    [3] T. Itoh, O. Teechai and S. Tsujii, “A Fast Algorithm For Computing Multiplicative Inverses In GF(2m) Using Normal Bases, J. Society For Electronic Communications (Japan), 44, 31-36, 1986.    [4] J. Guajardo and C. Paar, “Efficient Algorithms For Elliptic Curve Cryptosystems,” proc. of 17th Annual Intl. Cryptology Conf. (CRYPTO' 97), LNCS1294, pp. 342-356, 1997.    [5] H. Brunner, A. Curiger and M. Hofstetter, “On Computing Multiplicative Inverses In GF(2m)”, IEEE Trans. Computers, Vol. 42, pp. 1010-1015, 1993.
A problem with the algorithm provided by Itoh is that the latency of a circuit is increased when only a small number of multiplication procedures is required. The Fermat's little theorem also has latency problems, but in this case, for a combinational circuit, the latency becomes smaller when the size of the circuit is increased, whereas for a sequential circuit, the latency is increased.
According to the present invention, use is made of the advantages offered by the two methods, and for both a sequential circuit and a combinational circuit, both circuit size and latency are reduced. With this invention, unlike with a normal circuit design that follows a trade-off relation between speed and area, problems associated with both speed and area are resolved.
According to the present invention, low latency (a small process clock count for the sequential circuit, or a small delay for the combinational circuit) is achieved for any value of m, using a combination of basic modules, without increasing the number of multiplication procedures. According to all of the conventional methods, the reduction of latency, which is difficult, or the provision of means to reduce latency, involves a drastic increase in circuit size. In the event, the specific problems that are encountered are as follows.
(1) According to the method for calculating Fermat's little theorem, when a combinational circuit is employed latency can be improved by up toM{[log2(m−2)]+1};  [Expression 6]but to do this, m−2 multiplication circuits are required.
(2) According to the method proposed by Itoh and Tsujii, and a similar method, as a whole, only
 [log2(m−1)]+Hw(m−1)−1  [Expression 7]
multiplication procedures are required; although even then, it is difficult to improve latency. For a sequential circuit, the latency is[log2(m−1)]+Hw(m−1)−1  [Expression 8]cycles, and for a combinational circuit,M([log2(m−1)]+Hw(m−1)−1)  [Expression 9]cycles. These results are worse than those provided by the Fermat's little theorem.
(3) According to the method that results in the division of the subfield, only a limited m and an irreducible polynomial are used. This method is not presented in opposition to the method of the present invention, and when this method and the method of the invention are employed together, circuit performance can be even further improved.
(4) According to the Euclidean algorithm, a latency 0(m) is obtained; but obtaining an improved latency is not easy.
According to the method of the invention, even though the total number of multiplication procedures is the same as for the algorithm proposed by Itoh and Tsujii (smaller than for the Fermat's little theorem), latency can be reduced until its maximum is about half that obtained by Itoh and Tsujii (the same as is obtained by Fermat's little theorem).