1. Field of the Invention
The present invention relates to computing algorithms, and in particular to computing algorithms required for cryptographic applications.
2. Description of the Related Art
Key lengths are steadily increasing especially in public-key cryptography, but also in other fields of cryptography. This is because security requirements placed upon such cryptographic algorithms are also increasing. The use of the RSA method as a representative of an asymmetric cryptography concept, that is to say the use of a public-key method, increases security from so-called brute-force attacks as the key length used increases. Brute-force attacks are attacks on a cryptographic algorithm wherein a key is to be inferred from trying out all possibilities. It is immediately evident that the amount of time theoretically required for a brute-force attack in order to try out all possibilities greatly increases as the key length increases.
It shall be pointed out in this context that RSA applications with key lengths of 512 bits formerly used to be considered sufficient. Due to technical and mathematical progress made by the “other side”, the key lengths for typical RSA applications were then increased to 1024 bits. Nowadays there are various people who claim that even this key length is not sufficient, so that RSA key lengths of 2048 bits are aimed at.
On the other hand, when considering existing cryptographic coprocessors, such as on SmartCards, it can be seen that there is a desire, of course, to also permit RSA applications with key lengths of, for example, 2048 bits, to run on cryptographic circuits which have actually been developed for key lengths of, for example, 1024 bits only. Thus, arithmetic coprocessors for existing SmartCard applications are characterized by the very fact that they have been developed for a specified bit length which is not suitable, i.e. too short, for most recent security requirements. This leads to the fact that, for example, a 2048-bit RSA algorithm cannot be efficiently handled on 1024-bit coprocessors. For RSA applications, the Chinese Remainder Theorem (CRT) has been known, wherein a modular exponentiation with a large key length is broken down into two modular exponentiations with half the key length, whereupon the results of both modular exponentiations of half the length are combined accordingly.
Recently it has turned out that the Chinese Remainder Theorem is particularly susceptible to DFA attacks (DFA=differential fault analysis).
One problem associated with many methods therefore is the “doubling” of so-called modular multiplication, which is a central operation in cryptographic calculations. Thus, a modular exponentiation may be broken down into many modular multiplications, i.e. into an operation wherein a product of a first operand A and of a second operand B is calculated in a residual class with regard to a modulus N. If the operands A and B have a length of 2 n bits each, calculating units having a length of 2 n bits are typically used. These calculating units are referred to as long-number calculating units because of their long lengths, as opposed to, for example, 8-bits, 16-bits, 32-bits or 64-bits architectures employed, for example, for PC- or workstation processors.
Therefore there is a desire to implement a modular multiplication A*B mod N with numbers A, B and N of a bit length of 2 n on an n-bits calculating unit. This is very time consuming, since the numbers A, B, N, . . . may only ever be loaded fraction by fraction, which is why conventional methods require a large amount of organization and are error-prone, if they do not fail completely. There are several methods in the art with which this problem has been solved so far. These methods have been known by the keywords of Montgomery multiplication, normal multiplication, e.g. with Karatsuba-Ofman, and a subsequent reduction, such as Barret reduction.
Another concept making use of a Montgomery calculation in a “CRT window” has been set forth in P. Pailler, “Low-cost double size modular exponentiation or how to stretch your cryptocoprocessor”.
All such concepts are expensive in terms of calculating time and data organization and are therefore not always efficient.