1. Field of the Invention
The invention pertains generally to computers. In particular, it pertains to a linear systolic array Montgomery multiplier.
2. Description of the Related Art
Exponentiation of large numbers has many uses, including public key encryption such as the Rivest-Shamir-Adleman (RSA) algorithm for data encryption and decryption. A common approach to performing the exponentiation, known as the xe2x80x98square and multiplyxe2x80x99 technique, performs a square operation (multiplying the accumulated result by itself) for each bit in the exponent, and a xe2x80x98multiplyxe2x80x99 operation (multiplying the accumulated result by a base number) for every xe2x80x981xe2x80x99 bit in the exponent. Assuming an equal number of xe2x80x981xe2x80x99s and xe2x80x980xe2x80x99s in the average exponent, a typical exponentiation using 1024-bit numbers requires over 1500 operations, with each operation involving 1024-bit numbers.
Montgomery multipliers are frequently used to perform the RSA and similar algorithms more efficiently by using a transform. Montgomery multipliers perform a transform of the operation needed for exponentiation by performing the operation Axc3x97B mod M (the remainder of A times B divided by M), and for large numbers is much more efficient than a direct approach to performing the math. Some Montgomery multipliers use a linear systolic array, i.e., a chain of identical processing elements (PEs), with each PE working on a portion (typically four bits) of each of the large numbers involved. The chain contains enough PEs to hold the largest of the numbers involved, including interim results. Carries and other interim values of the operation are fed in both directions between adjacent PEs.
Because of the clocked chain design of a linear systolic array Montgomery multiplier (LSAMM), each PE processes data for one clock cycle, then waits for one clock cycle to receive interim values from the adjacent PEs. Adjacent PEs are one clock cycle out of sync, i.e., the odd-numbered PEs are processing while the even-numbered PEs are waiting, and vice-versa. This means that each PE is idle half the time, and those idle cycles represent wasted resources. The idle cycles, which can be considered a separate channel, can be used to perform another operation. However, in a conventional LSAMM circuit, two of the three parameters (e.g., B and M) must be the same in both channels. With this limitation, the conventional approach to utilizing some of these wasted cycles in a square-and-multiply operation is to perform the squares for an exponentiation in one channel, and to perform the multiplies for the same exponentiation in the alternate channel. Since the average exponent contains a xe2x80x981xe2x80x99 in approximately half the bit positions, only half the cycles in the alternate channel are used for multiplies, while the remaining cycles in that channel, about 25% of the total cycles in both channels, remain idle and wasted.