The present invention relates generally to cryptographic accelerators to accelerate cryptographic computation and, more particularly, to a cryptographic accelerator employing recursive algorithms to accelerate multiplication and squaring operations.
Encryption is the process of disguising intelligible information, called plaintext, to hide its substance from eavesdroppers. Encrypting plaintext produces unintelligible data called cipher text. Decryption is the process of converting ciphered text back to its original plaintext. Using encryption and decryption, two parties can send messages over an insecure channel without revealing the substance of the message to eavesdroppers. A cryptographic algorithm, or cipher, is a mathematical function used in the encryption and decryption of data. A cryptographic algorithm typically works in combination with a key to encrypt and decrypt messages. The key, typically a large random number, controls the encryption of data by the cryptographic algorithm. The same plaintext encrypts to different ciphered text with different keys. In general, it is extremely difficult to recover the plaintext of a message without access to the key, even by an eavesdropper having full knowledge of the cryptographic algorithm.
One commonly used type of cryptographic algorithm is a public key algorithm. Public key cryptographic algorithms are based on the identity:|Xz|N=X  Eq. (1)where N, the modulus, is the product of two secret prime numbers P1 and P2, and Z is equal to M(P1−1) (P2−1)+1. The exponent Z is factored into the product of a private key KPRIV and a public key KPUB. Many key pairs can be found by choosing different values of M. The public key KPUB is published and may be used by another to send messages to the owner of the public key, which can only be deciphered by the recipient using the corresponding private key KPRIV.
One popular public key algorithm is the RSA Algorithm. The RSA Algorithm enciphers blocks of bits at a time, which may be viewed as a binary number X. The binary number X must have an arithmetic value less than the encryption modulus N. Encryption is performed by raising X to the power of the public key KPUB and reducing it modulo N to produce encrypted ciphertext. The ciphertext may also be viewed as a binary number Y having an arithmetic value less than N. Decryption is performed by raising the binary number Y to the power of the private key KPRIV and reducing the result modulo N.
Another use of public key algorithms is for signing messages to authenticate the sending party's identity. The sending party may sign a message by encrypting the message with his private key KPRIV. The receiving party can then use the sender's public key KPUB to decrypt the message. If the message is decrypted successfully, only the sending party in possession of the private key KPRIV could have sent that message. This process of authenticating the message by encryption using the sender's private key KPRIV is referred to as signing.
It is also known to doubly encrypt messages to provide both secure communications and authentication capability. In this case, each party to the communication possesses a public key used for encrypting messages and a private key for decrypting messages. The message is first signed using the sender's private key KPRIV1 and modulus N1 and then encrypted using the recipient's public key KPUB2 and modulus N2. The recipient decrypts the message using the recipient's private key KPRIV2 and modulus N2 to recover the signed message. The signed message is then decrypted using the sender's public key KPUB1 and modulus N1 to obtain the original message. Since the sender is the only person possessing the private key KPRIV1 that can generate the signed message, the sender's identity is authenticated to the recipient.
Another prior art algorithm that involves exponential operations is the Diffie-Hellman Algorithm. The Diffie-Hellman Algorithm is a key exchange algorithm that allows two parties to agree on a secret key over an insecure channel without divulging the secret key. According to the Diffie-Hellman Algorithm, the parties agree on two, non-secret prime numbers P1 and P2. P1 is typically a large prime number. The security of the system is based on the difficulty of factoring numbers the same size as P1. P2 may be a one-digit prime number. Each party generates a large random number, denoted x and y, respectively. The parties then calculate derived numbers X and Y. The first party computes X using the equation X=P2x mod P1. The second party computes Y using the equation Y=P1y mod P1. The first party transmits X to the second party; the second party transmits Y to the first party. The first party computes the secret key K using the equation K=YX mod N. The second party computes the secret key K using the equation K=XY mod N. An eavesdropper cannot compute K with knowledge only of P1, P2, X and Y. Therefore, the value K, which was computed independently by the two parties using information exchanged over the insecure channel, may be used by the parties as the secret key for secure communications.
All of the above-described algorithms involve exponential operations with very large binary numbers. For example, in the RSA Algorithm, the private key KPRIV typically has a length of approximately 2,048 bits. The message block and encryption modulus N are typically in the same order of wordlength. Thus, encryption or decryption with the private key KPRIV involves exponentiating a 2,048 bit message block with a 2,048 bit exponent and reducing the result modulo another 2,048 bit number. These calculations require significant computational power to perform.
A number of algorithms have been devised to reduce the complexity of cryptographic calculations involving exponentiation or modulo reduction. One algorithm, referred to herein as the Successive Squares Algorithm, is used to raise a first large number to the power of a second large number. A second algorithm, referred to herein as the Modulo Reduction Algorithm, is used to reduce a first large number modulo a second large number.
The Successive Squares Algorithm is used to raise a bitstring X to a large power Y. In decryption, the bitstring X is the encrypted ciphertext, and the power Y is the decryption key. In encryption, the bitstring X is the plaintext message, and the power Y is the encryption key. The successive squares of the bitstring X are computed and used to multiply an accumulated value Z, depending on the value of a corresponding bit in the power Y. The accumulated value Z is initialized to a starting value of 1. The successive squares are denoted herein as X1=X1,X2=X2,X3=X4, . . . Xn=Xn−1. In the Successive Squares Algorithm, the least significant bit in the power Y, denoted B1, corresponds to the first power of X, the second bit B2 corresponds to the second power of X, the third bit B3 corresponds to the fourth power of X, and so forth until the last bit BL is reached. Each successive square, X1, X2 . . . Xn, is used to multiply the accumulated value Z, depending on the value of the corresponding bit BN in the power Y. In particular, the accumulated value Z is multiplied by a successive square when the corresponding bit BN in the power Y is 1. Successive squares corresponding to “0” bits in the power Y do not multiply the accumulated value Z. The Successive Squares Algorithm reduces the number of values that need to be multiplied from 22048 to the order of 2,048 where X and Y are 2,048 bits in length.
After each multiplication or squaring operation, the accumulated value Z has a wordlength in the order of 4,096 bits. In encryption and decryption, this accumulated value Z is reduced by modulo reduction to a value in the order of 2,048 bits in length. In particular, the result of each squaring operation is reduced modulo the encryption modulus N of wordlength 2,048. This requires subtracting a large number of multiples of N until the value of the accumulated total Z is less than N. The number of multiples of N which have to be subtracted is in the order of 22048 or 10600 which eliminates the possibility of successive subtraction.
The Modulo Reduction Algorithm is used to reduce a first large number modulo a second large number. According to the Modulo Reduction Algorithm, the approximate reciprocal of N is computed to 2,048 significant bits, ignoring leading zeros after the binary point. Each time a 4,096 bit accumulated value Z is to be reduced modulo N, the approximate number of times T that N would have to be subtracted from Z is calculated using the equation T=Z·1/N, which is a single long multiplication of Z with the approximate reciprocal of N. The product of T·N is then subtracted from the accumulated value Z, which will reduce the accumulated value Z to within one or two times N of the required result. The reduction is then completed by subtracting the encryption modulus N one or two times more from the accumulated value Z until the remainder is less than N but not negative. This Modulo Reduction Algorithm requires two long multiplications and two subtractions instead of 10600 successive subtractions and is vital to render such calculations possible.
It is well known in the art that, since squaring is the same as multiplication with two equal arguments, advantage can be taken of the fact that half of the partial products to be summed are the same as the other half, allowing squaring to be performed twice as fast as multiplication. It is also known in the art that the product of two numbers A and B can be obtained from the difference in squares of (A+B) and (A−B).
In a published paper entitled “Multiplication of Multi Digit Numbers by Automata,” by A. Karatsuba and Y. Ofman (Soviet Physics—Docklady 7, page 595–596, 1963), an algorithm, referred to herein as the K-O Multiplication Algorithm, is described for expressing the product of two N-digit numbers in terms of three products of N/2 digit numbers, thereby achieving a reduction to ¾ of the effort compared with four products of N/2 digit numbers needed conventionally. However, the N/2 multiplications are each, in turn, expressible as three N/4 digit multiplications, and so forth, so that the total reduction of effort is to the value (¾)log2N, as shown by D. E. Knuth in “The Art of Computer Programming, Vol. 2, Seminumerical Algorithms,” (Addison Wesley, Reading, Mass., 1971). The above references are incorporated herein by reference.
To achieve the maximum reduction of effort using the K-O Multiplication Algorithm, the recursions should preferably stop at some wordlength where multiplication is more efficiently performed in the conventional manner or by special purpose hardware. Such a stage exists because the effort of multiplication reduces as the square of N while the overhead of the K-O Multiplication Algorithm reduces only linearly, so that at some wordlength, conventional multiplication becomes preferable.
The K-O Multiplication Algorithm has been used in software applications to perform long multiplication in public key cryptographic algorithms. Implementing the K-O Multiplication Algorithm in software suggests use of recursive programs. Recursion in this field refers to a program subroutine that is allowed to call itself, as opposed to simple iterations or loops. Recursion also includes the case of a first program calling a second program, which in turn calls the first program. In this case, no program calls itself but a compiler that supports recursion is necessary to give correct results when such recursive calls are used. Not all computer languages or implementations of computer programming languages support recursive subroutines.
Recursively structured hardware circuits for performing calculations are also known. Examples of recursively structured hardware circuits are described in U.S. Pat. No. 6,044,390 to Golnabi et al; U.S. Pat. No. 6,041,340 to Mintzer et al; and in U.S. Pat. No. 5,765,207 to Curran.
It is now common to employ a computer programming-like language known as VHDL to describe logic circuits of a higher complexity as interconnections of logic circuits of a lower complexity, and so forth, until only primitive circuits are required that can be found in an existing library. This hierarchical description of circuits is then translated by the VHDL compiler into a flat interconnection of primitive library elements. Present day VHDL is an example of a language that does not support recursive calls; that is, no circuit block in the hierarchy can include in its description a circuit block which is an instance of itself.