The present invention relates generally to information processing systems and devices, such as cryptographic systems and devices, which include a capability for multiplying signals of a finite field having a normal basis.
Finite field arithmetic operations are becoming increasingly important in today""s computer systems, particularly for cryptographic processing applications. Among the more common finite fields used in cryptography are odd-characteristic finite fields of degree 1, conventionally known as GF(p) arithmetic or arithmetic modulo a prime, and even-characteristic finite fields of degree greater than 1, conventionally known as GF(2m) arithmetic (where m is the degree). GF(2m) arithmetic is further classified according to the choice of basis for representing elements of the finite field; two common choices are polynomial basis and normal basis.
It is known that multiplication in normal basis, particularly in optimal normal basis (ONB), can be implemented efficiently in hardware. However, little attention has been devoted to implementing normal basis multiplication efficiently in software. A number of difficulties have prevented the development of fast software implementation of normal basis multiplication. First, when multiplying two elements represented in normal basis according to the standard formula, the coefficients of their product need to be computed one bit at a time. Second, the computation of a given coefficient involves a series of arithmetic operations which need to be performed sequentially in software, while in hardware, they can be easily parallelized.
We will first define some basic notation for a finite field GF(2m) and its representation in normal basis. Then, we will describe conventional multiplication formulas for both general normal basis and ONB. Let w denote the word size in bits. For a typical software implementation, we have w=32. Let m be a positive integer. For simplicity, we assume that w|m. The finite field GF(2m) consists of 2m elements, with certain rules for field addition and multiplication. The finite field GF(2m) has various basis representations including normal basis representation. A binary polynomial is a polynomial with coefficients in GF(2). A binary polynomial is irreducible if it is not the product of two binary polynomials of smaller degree. For simplicity, we will refer to such a polynomial an irreducible polynomial. Irreducible polynomials exist for every degree m and can be found efficiently. Let g(x) be an irreducible polynomial of degree m. If xcex2 is a root of g(x), then the m distinct roots of g(x) in GF(2m) are given by
B={xcex2, xcex22, xcex222, . . . , xcex22mxe2x88x921}.
If the elements of B are linearly independent, then g(x) is called a normal polynomial and B is called a normal basis for GF(2m) over GF(2). Normal polynomials exist for every degree m. Given any element a xcex5GF(2m), one can write       a    =                  ∑                  i          =          0                          m          -          1                    ⁢                        a          i                ⁢                  β                      2            i                                ,      xe2x80x83    ⁢            where      ⁢              xe2x80x83            ⁢              a        i              ∈                  {                  0          ,          1                }            .      
In normal basis, field multiplication is generally carried out using a multiplication matrix, which is an m-by-m matrix M with entries in GF(2). Details on how to compute matrix M from g(x) are known in the prior art, e.g., A. Menezes et al., xe2x80x9cApplications of Finite Fields,xe2x80x9d Kluwer Academic Publishers, 1993, and IEEE Standard for Public-Key Cryptography, http://stdsbbs.ieee.org/groups/1363/index.html. Other details regarding conventional finite field arithmetic techniques can be found in, e.g., U.S. Pat. No. 4,587,627 issued May 6, 1986 to J. L. Massey and J. K. Omura, entitled xe2x80x9cComputational method and apparatus for finite field arithmetic,xe2x80x9d and G.B. Patent No. 2,176,325 issued Dec. 17, 1986 to R. C. Mullin, I. M. Onyszchuk, and S. A. Vanstone, entitled xe2x80x9cFinite field multiplication in a cryptographic systemxe2x80x94offsetting suffixes and rotating binary digits in respective shift registers so as to produce all product vector terms simultaneouslyxe2x80x9d (related U.S. Pat. No. 4,745,568 was issued May 17, 1988).
Below, we describe a conventional normal basis multiplication formula in two slightly different formats. Let a=(a0 a1 . . . amxe2x88x921) and b=(b0 b1 . . . bmxe2x88x921) be two elements. Then their product c=(c0 c1 . . . cmxe2x88x921) can be computed one bit at a time as follows:
c0=(a0 a1 . . . amxe2x88x921)M(b0 b1 . . . bmxe2x88x921)T
c1=(a1 a2 . . . a0)M(b1 b2 . . . b0)T
cmxe2x88x921=(amxe2x88x921 a0 . . . amxe2x88x922)M(bmxe2x88x921 b0 . . . bmxe2x88x922)T xe2x80x83xe2x80x83(1)
In formula (1), when a new coefficient ck needs to be computed, the coefficients of both a and b are rotated to the left by one bit. This allows efficient hardware implementations of normal basis multiplication.
In a typical xe2x80x9cCxe2x80x9d programming language implementation of formula (1), a, b, and columns of M are all stored in words. Each matrix-vector multiplication M(b0 b1 . . . bmxe2x88x921)T can be carried out with (m/2)(m/w) exclusive-or operations on average, and hence the total number of word operations for computing c is about m(m/2) (m/w)=m3/2w. Note that the computation time is independent of the number of non-zero entries in M.
Let Mij denote the entries of matrix M. The following is another way of writing formula (1):
for k from 0 to mxe2x88x921
ck=xcexa3i=0, . . . mxe2x88x921ai+kxe2x80xa2(xcexa3j=0, . . . mxe2x88x921Mijxe2x80xa2bj+k).xe2x80x83xe2x80x83(2)
Throughout the description, the addition operation xe2x80x9c+xe2x80x9d in a subscript is to be understood as addition modulo the degree m, unless otherwise specified; the symbol xe2x80x9cxe2x80xa2xe2x80x9d denotes AND; and the symbols xe2x80x9cxcexa3xe2x80x9d and xe2x80x9c⊕xe2x80x9d denote exclusive-or. In formula (2), essentially the same expression is used for each coefficient ck. More specifically, given the expression for ck, we simply increase the subscripts of a and b by one (modulo m) and the result is the expression for ck+1.
Using formula (2), the fewer 1""s in the multiplication matrix M, the faster a field multiplication can be done. An ONB is a normal basis which has the smallest number of 1""s in the multiplication matrix M. There are two kinds of ONB, called type I ONB and type II ONB, that differ in the mathematical formulae which define them. For both types of ONB, the multiplication matrix has exactly 2mxe2x88x921 non-zero entries. In particular, the first row has a single non-zero entry, and the rest of the rows have exactly two non-zero entries. In terms of formula (2), the total number of terms (of the form aiMijbj) for each ck is 2mxe2x88x921. ONBs only exist for certain values of degree m. For example, in the range [150, 200], there are only 15 values of m for which an ONB exists.
It is an object of the present invention to provide improved techniques for multiplication in a normal basis, which are particularly well suited for implementation in software, and can be applied to both general normal basis and ONB.
The invention provides improved techniques for implementing normal basis multiplication in processing systems and devices. The techniques are particularly well suited for implementation in software. Using the invention, the coefficients of a product of field elements can be computed one processor word at a time, e.g., 32 bits in a 32-bit processor, as opposed to one bit at a time as required by certain conventional approaches, thereby fully taking advantage of the fast word-based operations currently available in modern processors and software.
An illustrative embodiment of the invention includes a first rotator which receives a first input signal representative of a first normal basis field element (a0 a1 . . . amxe2x88x921), and a second rotator which receives a second input signal representative of a second normal basis field element (b0 b1 . . . bmxe2x88x921). A word multiplier receives output signals from the first and second rotators, corresponding to rotated representations of the first and second elements, respectively, and processes the rotated representations w bits at a time to generate an output signal representative of a product of the first and second elements, where w is a word length associated with the word multiplier. The rotated representation of the first element may be given by A[i]=(ai ai+1 . . . ai+wxe2x88x921), the rotated representation of the second element may be given by B[i]=(bi bi+1 . . . bi+wxe2x88x921), and the product may be given by c=(C[0], C[w], C[2w], . . . , C[mxe2x88x92w]), where C[i]=(ci ci+1 . . . ci+wxe2x88x921), m is the degree of the finite field, w is the word length, and i=0, 1, . . . mxe2x88x921.
In accordance with another aspect of the invention, the performance of a normal basis multiplier can be further improved by precomputing and storing certain elements of the rotated representations, such as elements A[i+wt] and B[i+wt], where t=0, 1, . . . m/wxe2x88x921. For example, if A[i+m]=A[i]=(ai ai+1 . . . ai+wxe2x88x921) and B[i+m]=B[i]=(bi bi+1 . . . bi+wxe2x88x921), the array A may be precomputed as A[0], A[1], . . . , A[mxe2x88x921], A[m], A[m+1], . . . , A[2mxe2x88x921], and the array B may be precomputed as B[0], B[1], . . . , B[mxe2x88x921], B[m], B[m+1], . . . , B[2mxe2x88x921], such that each of array A and B include 2m elements of length w. Words 0 through mxe2x88x921 in A and B are then used in computing C[0], words w through w+mxe2x88x921 in A and B are used in computing C[w], and the remaining elements C[2w], . . . , C[mxe2x88x92w] of the product c are computed in the same manner. Further improvements can be provided in the case of an optimal normal basis (ONB) by, for example, precomputing two arrays, B1[i+m]=B1[i]=B[mult-array[2*ixe2x88x921]] and B2[i+m]=B2[i]=B[mult-array[2*i]], for the rotated representation B, such that A, B1, and B2 can be accessed sequentially, where mult-array is an array with 2mxe2x88x921 entries and is a compact representation of the multiplication matrix M.
The invention provides improved performance for both general normal basis and ONB. For example, for both type I ONB and type II ONB, the number of word operations involved for computing c is roughly (m/w)(3m)=3m2/w. Compared with conventional m3/2w operations using standard formula (1), the invention can improve computational speed by a factor on the order of m/6. Although the illustrative embodiment described above is particularly well suited for use in software configured to run on a conventional computer with a 32-bit or 64-bit processor, the invention can be implemented in computers or other systems or devices with other word lengths, including, e.g., embedded systems such as pagers, digital notepads or palmtop computers with 8-bit processors. As another example, although the illustrative embodiment involves multiplication of two field elements, the techniques of the invention can be extended in a straightforward manner to multiplication of more than two field elements. These and other features of the present invention will become more apparent from the accompanying drawings and the following detailed description.