1. Field of the Invention
The present invention relates to the processing of digital signals to render modular multiplication.
2. Description of Related Art
Modular multiplication, which is the computation of A·B modulo M where A, B, and M are integer values, is a fundamental mathematical operation in applications based on number-theoretic arithmetic. A central application area is cryptography, where techniques such as the popular RSA and DSS methods utilize modular multiplication as the elemental computation. Since large word lengths on the order of thousands of bits are typically processed, hardware approaches to modular multiplication are typically very slow. Existing art attempts to address this deficiency through a handful of approaches.
Linear systolic array approaches dominate the art, with the article C. Walter, “Systolic modular multiplication,” IEEE Transactions on Computers, v. 42, no. 3, pp. 376-378, 1993, being representative. In such an approach, a linear array of processing elements is connected so that all signal paths are formed between adjoining elements only. Thus, signal path lengths are minimized. Accordingly, all signal paths only connect two adjoining elements, guaranteeing unit fan out. The forgoing properties of systolic arrays ensure that the clock rate is determined solely by the processing element delay. However, efforts to scale the performance beyond the level offered by a single linear array have encountered very limited success. Cell optimization is the commonly applied technique to gain performance. However, performance scales only logarithmically with respect to consumed integrated circuit area.
Another method which attempts to provide a performance-area tradeoff is the digit-serial array. In the paper, J. Guo and C. Wang, “A novel digit-serial systolic array for modular multiplication,” in Proc. of the 1998 IEEE International Symposium on Circuits and Systems, v. 2, pp. 177-180, 1998, a digit-serial modular multiplier methodology was presented. However, the arrays were not pipelined, and thus the clock period of the digit-serial cells grows proportionally with digit size. Therefore, performance scaling occurs in a sub-linear fashion for small digit sizes and quickly saturates to yield negligible performance gains for large digit sizes.
A non-systolic array was presented in the article H. Orup, “Simplifying quotient digit determination in high-radix modular multiplication,” in Proc. of the 12th Symposium on Computer Arithmetic, pp. 193-199, 1995. A roughly linear performance-area tradeoff was achieved through retiming of the modular correction loop within the modular multiplication algorithm. However, the clock rate is severely limited by the required full-word-length signal broadcasts of the modular correction selection bit. Thus, the fan out of the aforementioned signal is the complete word length. Implementational efforts to increase the signal drive through transistor sizing destroys the linear performance-area trade off and only provide minor mitigation of the slow-clock-rate obstacle plaguing this methodology.