The present application is directed to a method and apparatus for performing encryption and decryption. The application discloses several inventions relating to an overall system for the use of exponentiation modulo N as a mechanism for carrying out the desired cryptological goals and functions in a rapid, efficient, accurate and reliable manner. A first part of the disclosure is related to the construction of a method and its associated apparatus for carrying out modular multiplication. A second part of the disclosure is directed to an improved apparatus for carrying out modular multiplication through the partitioning of the problem into more manageable pieces and thus results in the construction of individual identical (if so desired) Processing Elements. A third part of the disclosure is directed to the utilization of the resulting series of Processing Elements in a pipelined fashion for increased speed and throughput. A fourth part of the disclosure is directed to an apparatus and method for calculating a unique inverse operation that is desirable as an input step or stage to the modular multiplication operation. A fifth part of the disclosure is directed to the use of the modular multiplication system described herein in its originally intended function of performing an exponentiation operation. A sixth part of the disclosure is directed to the use of the Chinese Remainder Theorem in conjunction with the exponentiation operation. A seventh part of the this disclosure is directed to the construction and utilization of checksum circuitry which is employed to insure reliable and accurate operation of the entire system. The present application is particularly directed the invention described in the sixth part of the disclosure.
More particularly, the present invention is directed to circuits, systems and methods for multiplying two binary numbers having up to n bits each with the multiplication being modulo, N an odd number. In particular, the present invention partitions one of the factors into m blocks with k bits in each block with the natural constraint that mk≧n+2. Even more particularly, the present invention is directed to multiplication modulo N when the factors being multiplied have a large number of bits. The present invention is also particularly directed to the use of the modular multiplication function hardware described herein in the calculation of a modular exponentiation function for use in cryptography. Ancillary functions, such as the calculation of a convenient inverse and a checksum mechanism for the entire apparatus are also provided herein. The partitioning employed herein also results in the construction of Processing Elements which can be cascaded to provide significant expansion capabilities for larger values of N. This, in turn, leads to a modality of Processor Element use in a pipelined fashion. The cascade of Processor Elements is also advantageously controllable so as to effectively partition the Processor Element chain into separate pieces which independently work on distinct and separate factors of N.
Those wishing an optimal understanding from this disclosure should appreciate at the outset that the purpose of the methods and circuits shown herein is the performance of certain arithmetic functions needed in modern cryptography and that these operations are not standard multiplication, inversion and/or exponentiation, but rather are modulo N operations. The fact that the present application is directed to modular arithmetic circuits and methods, as opposed to standard arithmetic operations, is a fact which would be best to keep firmly in mind, particularly since modular arithmetic, with it implied division operations, is much more difficult to perform and to calculate, particularly where exponentiation modulo N is involved.
In a preferred system for implementation which takes advantage of certain aspects of the present invention, this application is also directed to a circuit and method of practice in which an adder array and a multiplier array are effectively partitioned into in a series of nearly identical processor elements with each processor element (PE) in the series operating on a sub-block of data. The multiplier array and adder array are thus partitioned. Thus, having recognized the ability to reconfigure the generic structure into a plurality of serially connected processor elements, the present invention is also directed to a method of operation in which each processor element operates as part of a pipeline over a plurality of operational cycles. The pipelining mode of operation is even further extended to the multiplication of a series of numbers in a fashion in which all of the processor elements are continuously actively generating results.
The multiplication of binary numbers modulo N is an important operation in modem, public-key cryptography. The security of any cryptographic system which is based upon the multiplication and subsequent factoring of large integers is directly related to the size of the numbers employed, that is, the number of bits or digits in the number. For example, each of the two multiplying factors may have up to 1,024 bits. However, for cryptographic purposes, it is necessary to carry out this multiplication modulo a number N. Accordingly, it should be understood that the multiplication considered herein multiplies two n bit numbers to produce a result with n bits or less rather then the usual 2n bits in conventional multiplication.
However, even though there is a desire for inclusion of a large number of bits in each factor, the speed of calculation becomes significantly slower as the number of digits or bits increase. However, for real-time cryptographic purposes, speed of encryption and decryption are important concerns. In particular, real-time cryptographic processing is a desirable result.
Different methods have been proposed for carrying out modular multiplication. In particular, in an article appearing in “The Mathematics of Computation,” Vol. 44, No. 170, April 1985, 519-521, Peter L. Montgomery describes an algorithm for “Modular Multiplication without Trial Division.” However, this article describes operations that are impractical to implement in hardware for a large value of N. Furthermore, the method described by Montgomery operates only in a single phase. In contrast, the system and method presented herein partitions operational cycles into two phases. From a hardware perspective, the partitioning provides a mechanism for hardware sharing which provides significant advantages.