1. Field of the Invention
The present invention relates generally to the fields of arithmetic processing and cryptography. More particularly, the present invention relates to a method and apparatus of performing modular multiplication.
2. Description of the Related Art
Modular exponentiation and related mathematical operations are commonly used in a number of applications such as cryptography. For example, modular exponentiation of the form XE mod M is the primary operation involved in the Rivest-Shamir-Adleman (RSA) cryptographic system where X, E, and M are all large (e.g. 512 or 1024-bit) unsigned integers. Modular exponentiation, in turn, is a process of repeated modular multiplication of the form A×B mod M utilizing similarly-sized integers. One way to perform modular multiplication is to compute A×B first and then reduce the resulting product modulo M. The time and resources necessary to perform these two separate operations and to detect the resulting remainder makes this technique undesirable for large integer numbers. Modular multiplication may also be performed utilizing another technique known as “Montgomery multiplication” in which the multiplication and modular reductions operations are performed in a single step within a mathematical transform space.
Conventional modular multipliers often include a linear systolic array or “chain” of processing elements (PEs) implemented in hardware such as an application-specific integrated circuit (ASIC) or a programmable logic device such as a field programmable gate array (FPGA). In conventional modular multipliers, a given processing element performs a portion of a modular multiplication operation by processing data and then passing it to its neighboring or adjacent PEs in a given clock cycle. In the next clock cycle, the original processing element remains idle while the neighboring processing elements process the received data and pass the processed data back, after which the original processing element may spend another cycle computing. Thus, in most conventional modular multipliers, each processing element does useful work every other clock cycle and is idle the remainder of the time. In one traditional modular multiplier, these idle cycles are used to concurrently perform an additional limited modular multiplication operation where two of three operands, B and M, must be the same.
In such a modular multiplier, one set of cycles is utilized to perform a squaring operation while the other set is utilized to perform a related multiplication operation according to a well-known square-and-multiply modular exponentiation technique. In all conventional modular multipliers however, the total number of processing elements required is related to the size of the modular multiplication operands and the number of bits processed per element. For example, a 512-bit modular multiplication operation would require at least 128 4-bit processing elements whereas a 1024-bit modular multiplication operation would require at least 256. Modular multipliers typically also include a fixed number of additional processing elements and/or additional logic to perform modular multiplication operations. Consequently, modular multipliers used to perform operations on large-size operands such as that utilized in modular exponentiation-based cryptographic systems require a large amount of space and hardware resources.