Multiplication hardware necessarily has a finite size, typically defined as having a pair of single-word operand inputs and a two-word result output. In order to also carry out multiply-accumulate operations, the multiplier output is normally connected to an accumulator circuit, which is at least two-words plus one-bit wide. (The supplemental bit can be part of the result or simply be present as CARRY information indicating either an overflow in the case of addition or an underflow in the case of subtraction in the accumulate part of the operation.) The basic operation is thus R=Z±XY. For simple multiplication, R=XY, the accumulator input Z=0. For squaring operations, X=Y. The basic operation is usually designed to perform standard integer arithmetic, but multiplication hardware that performs polynomial arithmetic also exists, especially for use in cryptographic applications.
In cryptography and a number of other applications, there is need to multiply very big integers comprising a large number of words. In order to perform these operations using operands that are much wider than the multiplication hardware, the operands must be sliced into a plurality of single-word segments and fed into the hardware in some specified sequence. These segments are operated upon and the intermediate results are accumulated such that the final product is computed as a sum of cross-products of various weights. The word-wide operand segments as well as the partial results, are stored in a memory that is addressed by the multiplier hardware's operations sequencer.
A typical sequence keeps a first operand's segment constant while the other operand's segments are scanned one word at a time into the multiplier; then the first operand increments to the next word-wide segment and the scan of the second operand is repeated. If X=Σixiwi, Y=Σjyjwj, and Z=Σkzkwk, with w=2n, then R=Σkrkwk=Z±XY=Σk zkwk±ΣiΣj (xiyj)wi+j, where i+j=k, and where n is the word size in bits. Thus, in a typical operations sequence, the words yj are cycled over all j for a fixed word xi, then i is incremented by one and the cycle of words yj is repeated for the new xi.
While the above described sequence is straightforward, easy to program, and obtains the correct result, each step or cycle requires an average of three accesses of the random-access memory. In particular, each step requires that yj and zk be read from memory, and a partial result rk be written back to memory.
An object of the invention is to provide a more efficient multi-word multiplication sequence for large integer operations, that requires an average of only one memory access per multiplication.