The characteristic-2-multiplication is used in a multitude of cryptographic processes, particularly in public key processes, such as—for example—in a hardware-based implementation of cryptographic processes on the basis of elliptic curves. The numbers used in such a characteristic-2-arithmetic may be efficiently represented as bit strings on a processor unit, processor or computer. Such a bit string may be temporarily stored by a register. The addition of two numbers in such a representation corresponds to the bitwise XOR operation of the represented bit strings. A multiplication of two bit strings or operands in the characteristic-2-arithmetic corresponds mathematically to the product of two polynomials from GF(2)[X], wherein the bit strings which are used for representing the numbers correspond to the 0/1-sequence of the coefficients of the respective polynomials.
The mathematical basis for a characteristic-2-multiplication consists in reducing the product of two numbers to a predefined quantity of partial products, which are then added together to produce the result. For example, to multiply the numbers 1011 and 1101 below, the partial products and the sum are applied:
                                                                                                                                      1                    0                    1                    1                                                                                                        0                    0                    0                    0                                                                                                        1                    0                    1                    1                                                                                                        1                    0                    1                    1                                                                                                                                                  1                    1                    1                    1                    1                    1                    1            
To produce the result, the partial products are added together by a column-wise XOR operation. As basic operations for carrying out such a multiplication, shift operations and bitwise AND operations are used for calculating the partial products in the rows of the above table and bitwise XOR operations are used for calculating the respective column total of the partial products.
Known optimized variants of this multiplication process are so-called window methods. In window methods the number of required additions of partial products may be reduced, in particular because a small table of multiples of the one operand is precalculated. With the help of this precalculated table, several bits at once may then be used in each subsequent stage for calculating partial products. The calculation of partial products may consequently be reduced to a lookup in the precalculated table. With optimized parameter selection, the saving in additions of partial products by the processing of several bits at once may be greater than the time and effort needed for the additional precalculation of the table.
Window methods may also be combined efficiently with shift commands. If the processor used for implementation provides efficient shift commands for specific increments, it may be advantageous to add up the partial products in a number of subtotals. For example, using a processor with a bus width of 8 bits and applying a window method with 4-bit wide windows, the interim result in the accumulator between two additions is moved to the left by 4 bits. However, if two different accumulators are used alternately for adding up the partial sums, then the content of the accumulators may be moved by 8 bits in each case. A movement by 1 byte, i.e. 8 bits, may be achieved most efficiently on a conventional processor by copying the data in the memory. It is only in the subsequent step, when the hitherto calculated interim results of the two accumulators are added together, that the content of an accumulator must be moved to the left by 4 bits. With this method it is possible to save a multitude of cost-intensive shift commands during the calculation of a product.
For longer operands, it may be advantageous to use asymptotically faster algorithms for calculating the multiplication, such as—for example—the Karatsuba or Fourier multiplication. The methods described above for multiplication may then be applied to shorter parts of the numbers to be multiplied.
Even though almost all conventional processors or processor units have hardware for rapid integer multiplication of two bit strings in the bus width of the processor unit, none of these conventional processors supports the characteristic-2-multiplications in hardware.
This means that such a multiplication must always be implemented in software and is therefore significantly slower in general than hardware-based integer multiplication.