1. Field of the Invention
The present invention relates generally to methods and circuits for computing and, more particularly, to methods and circuits for computing the product of two elements in a finite field.
2. Description of the Related Art
Multiplication over large finite fields, also known as Galois fields, is used in the implementation of certain cryptographic protocols based on a theory of elliptic curves over Galois fields. These cryptographic protocols are highly computationally intensive and therefore consume a significant level of computational resources in order to perform Galois field arithmetic. Consequently, any reduction in the number of operations required for Galois field arithmetic will have a significant impact on the overall consumption of computational resources.
Generally speaking, a field is a number system with addition, subtraction, multiplication and division. The operations on the elements of the field should be associative, distributive and commutative. Therefore, there should be an element 0, where 0+x=x, and an element 1, where 1*x=x. In addition, for every x, there is a (-x), where x+(-x)=0. Further, for every value of x that is not 0, there is an inverse (1/x) where x*(1/x)=1.
Some well-known examples of fields are the real numbers, the rational numbers and the complex numbers. Each of these sets has an infinite number of elements and are therefore infinite fields.
Finite fields have a finite number of elements. As an example, a finite field GF(p) is a field with p elements, where p is a prime number. The elements of the field GF(p) may be taken to be 0, 1, . . . , p-1. Elements of the field may be added, subtracted, multiplied and inverted, but the resulting number is reduced to a modulo value of p (mod p) at the end of the computation.
The smallest of all fields is the finite field GF(2), which is a finite field having two elements: 0 and 1. The elements may be added, subtracted, multiplied and divided. However, 1+1=0, because the modulo equivalent of the result of the addition, 2, is 0, i.e. 1+1=2 mod 2=0. Similarly, 0-1=1, because the result of the subtraction, (-1), is reduced to a modulo value of p, which is 2, the result of the subtraction is a modulo 1, i.e. 0-1=(-1) mod 2=1.
There are several representations of extension fields GF(2.sup.n) that lend themselves to efficient arithmetic implementation over the binary field GF(2). All such fields are referred to as having characteristic 2. Fields of characteristic 2 are the fields of primary interest in the present invention.
The finite field GF(2.sup.n) is a vector space of dimension n over GF(2). As such, it can be represented using any basis of n linearly independent elements of GF(2.sup.n) over the binary field GF(2). Therefore, elements of GF(2.sup.n) are represented by binary vectors of length n. Field addition is realized in all bases by a bit-wise exclusive OR (XOR) operation, whereas the structure of field multiplication is determined by the choice of basis for the representation.
Two families of bases are commonly used to represent the field GF(2.sup.n): standard (or polynomial) representation and normal basis (NB) representation.
In standard polynomial representation, the basis elements have the form 1,.omega.,.omega..sup.2, . . . .omega..sup.n-1, where w is a root in GF(2.sup.n) of an irreducible polynomial P(.kappa.) of degree n over GF(2). In an equivalent interpretation of this representation, the elements of GF(2.sup.n) are polynomials of degree &lt;n over GF(2), and arithmetic is carried out modulo an irreducible polynomial P(.kappa.) of degree n over GF(2).
In NB representation, the basis elements have the form .alpha., .alpha..sup.2, . . . , .alpha..sup.2.sup..sup.n-1 for a certain element .alpha. .epsilon. GF(2.sup.n). This defines a normal basis. In addition, if for all 0.ltoreq.i1.noteq.i2.ltoreq.n-1 there exists j1, j2 such that, .alpha..sup.2i1+2i2 =.alpha..sup.2j1 +.alpha..sup.2j2, then the basis is called optimal normal basis (ONB). The element a is called the generator of the basis. Optimal normal bases exist for an infinite subset of values of n.
Multiplication in NB representation format will now be discussed. For a given normal basis {.alpha., .alpha..sup.2, . . . , .alpha..sup.2.sup..sup.n-1 } for a certain element .alpha..epsilon.GF(2.sup.n), a series of matrices or tensors is defined by ##EQU1##
Also, define a matrix T.sub.k =t.sub.ij.sup.(k). In other words, the (i, j) element of matrix T.sub.k is =t.sub.ij.sup.(k). All of the matrices T.sub.k, (k=0, 1, . . . , n-1), are closely related, e.g. t.sub.ij.sup.(k) =t.sub.i-k,j-k.sup.(0), where 0.ltoreq.i, j, k.ltoreq.n-1 and the subscripts are modulo n.
Furthermore, we identify an element a .epsilon. GF(2.sup.n), where ##EQU2##
and the vector a=(a.sub.0, a.sub.1, . . . , a.sub.n-1), a.sub.i .epsilon. GF(2)={0, 1}. We also identify a similarly constituted element b .epsilon. GF(2.sup.n).
The elements a and b, where a, b .epsilon. GF(2.sup.n), are multiplied to obtain a result c .epsilon. GF(2.sup.n). It can be shown that the k.sup.th component of the vector for c is shown in equation (2) below: ##EQU3##
Now, let a.sup.[l] be the l.sup.th cyclic left shift of the vector (a.sub.0, a.sub.1, . . . , a.sub.n-1), where a.sup.[l] =(a.sub.l, a.sub.l+1, . . . , a.sub.n-1, a.sub.0, . . . , a.sub.n-1). Then equation (2) can be written as ##EQU4##
Equation (3) can then be implemented in a multiplier circuit, such as conventional multiplier 100 illustrated in FIG. 1. Multiplier 100 has two shift registers 110 and 140. Each of shift registers 110 and 140 is circularly connected with a shift output being coupled to a shift input of each register. Each of the shift registers also receives a SHIFT signal at a shift control input which causes a right-shift operation to be performed.
Each of shift registers 110 and 140 has a parallel output from each position of the register. The parallel outputs from shift register 110 are coupled to a series of AND gates 122, 124 and 126. The AND gates shown in the diagram are representative of a complete series of n AND gates where there is one AND gate corresponding to each position, or vector element, in shift register 110.
The parallel output from shift register 140 are coupled to a series of modulo 2 adders 132,134 and 136. The modulo 2 adders 132,134 and 136 are representative of a complete series of n adders where there is one adder corresponding to each position in shift register 110. The inputs to each of the series of adders are determined by the tensor Tk implemented by the multiplier circuit 100.
For an ONB representation, each of adders 134 . . . 136 will have two input taps coupled to register 140, as shown. However, adder 132 would have only one input and could therefore be replaced with a simple signal line. For a general NB representation, the number of taps off each adder 132,134, . . . 136 will vary according to the representation and function being implemented by the multiplier.
The outputs from each of the series of adders are input to the corresponding one of the series of AND gates. Thus, the output of adder 132 is input to AND gate 122, the output of adder 134 is input to AND gate 124, and the output from adder 136 is input to AND gate 126. The output of each of the series of AND gates, in turn, is input to modulo 2 adder 150.
In operation, vector element a is loaded into shift register 110 and vector element b is loaded into shift register 140. The elements of vectors a and b are operated upon by the combinational logic of the series of AND gates 122,124 and 126 as well as the series of adders 132,134 and 136. Thus, the element a.sub.k of vector a is input to AND gate 122 and ANDed with the output of adder 132 which modulo 2 adds the elements (bk, b.sub.k+1, . . . , b.sub.k-1) of vector b, as determined by the tensor T.sub.k. Similarly, element a.sub.k+1 is input to AND gate 124 for combination with the result of modulo addition of the elements of vector b output from adder 134 and element a.sub.k-1 is input to AND gate 126 for combination with the result of modulo addition of the elements of vector b output from adder 136.
Adder 150 then adds the outputs of the series of AND gates 122, 124 and 126. The result at the output of adder 150 is an element c.sub.k of result vector c. The vector c appears serially at the output of adder 150, one bit for each cycle of the SHIFT signal. Note that each element c.sub.k of vector c is complete at the corresponding k.sup.th cycle. Also note that all the elements of vectors a and b enter into the calculation of c.sub.k in each cycle. Thus, calculation of result vector c requires n clock cycles, one clock cycle for each element of c=(c.sub.0, c.sub.1, . . . , c.sub.n-1). In addition, since k=(0, 1, . . . , n-1), n multiplies must be performed to obtain all n values of T.sub.k. Thus, the complexity of the multiplication operation is proportional to n.sup.2.
There are many conventional approaches to multiplication of vectors in finite fields. Multiplier 100 is representative of one conventional approach to multiplication. Other approaches will vary, but multiplier 100 remains suitably representative for purposes of describing the conventional technology.
Large finite fields are the basis of many modern cryptographic algorithms, e.g. elliptic curve cryptography. In these applications, n is typically on the order of 100 to 400 which, given that the complexity of the multiplication operation is proportional to n.sup.2, each field multiply will require on the order of ten thousand operations. The field arithmetic therefore becomes a computational bottleneck and it becomes important to reduce the overhead required to perform encryption and decryption. On the other hand, the explosive development of the Internet is expected to make the use of encryption become increasingly widespread.
Accordingly, the need remains for ways to reduce the overhead required to perform multiplication operations in large finite fields.