The present invention relates to calculating units for reducing an input number with respect to a modulus and particularly to calculating units whose processable word width is less than a word width of the input number or the modulus, wherein such requirements particularly occur in cryptographic applications.
The modular multiplication is a central operation used in modular exponentiation, such as it is usually used in cryptography. For example, as shown in FIG. 2a, a key pair is generated in public key cryptography, i.e. in asymmetric cryptography, such as in the RSA method. The key pair consists of a public key e and a private key d. The private key is only known to one entity. The public key serves this entity, but is provided to another entity which wants to send, for example, encrypted data to the one entity to which the private key belongs. As shown in FIG. 2a, an encryption of an unencrypted message M to an encrypted message C is done by calculating a so-called modular exponentiation, in which the message is raised to a higher power with the public key, to then perform a modular reduction with respect to the modulus N, which is also known publicly. For the decryption, the same operation is performed, but now with the private key as exponent, so that the one entity to which the private key belongs and by which the public key was originally distributed to the other entity, again obtains the plain text message M.
These public key methods may also be used as signature/verification methods. An entity generates a digital signature by encrypting the message M to be signed with the private key of this entity to generate the signature S, such as it is also illustrated in FIG. 2a. The verification is then done by the verifying entity subjecting the signature to modular exponentiation with the public key e of the signing entity to then obtain a plain text message M that may be compared to the plain text message M to which the signature is assigned. If the plain text message obtained in the verification matches the plain text message to which the signature is assigned, it may be assumed that the signed document is authentic.
As mentioned above, a cryptographic calculation including modular exponentiation, such as illustrated in FIG. 2b, is split into several modular multiplications. For example, it is usually preferred to calculate a modular exponentiation by applying modular multiplications consecutively. In particular, due to the increased security requirements for the RSA algorithm, there is an interest to execute a modular multiplication with a width of 2048 bits, i.e. with key lengths and/or modulus lengths of 2048 bits.
Generally in modular multiplication as part of a cryptographic calculation, both the multiplier A and the multiplicand B and the modulus N represent parameters of the cryptographic calculation, because the final results, such as plain text message, encrypted message, signature, etc. depend on these parameters.
As already mentioned, there is an interest to steadily increase the key lengths of public key cryptography, because this allows to still prevent so-called brute force attacks with increasingly fast processors. For example, the effort of a brute force attack is correlated with the key length, so that increasingly long keys also require increasingly more complex brute force attacks which, with currently available computers, take so much time that a cryptographic algorithm may be considered to be safe. However, what is problematic with increasingly larger key lengths is that the key length that a crypto co-processor in a chip card or a computer (for example in a TPM module) has is limited by the long number calculating unit included in this crypto co-processor. Such a long number calculating unit is shown, for example, in FIG. 4c, where a so-called bit-slice structure of a long number calculating unit is illustrated.
In the embodiment shown in FIG. 4c, each bit slice includes an arithmetic unit, which may, for example, be a one-bit full adder, which may receive a carry from a lower bit slice and which may output a carry to a higher bit slice. Furthermore, at least one register is associated with such a bit slice. However, it is preferred to associate a certain number of registers, for example two or, even better, for example five registers. In a currently existing crypto co-processor with a bit slice number of 1408 slices, a bit slice includes five registers, i.e. register Z, register C, register N, register CR0 and register CR4, as indicated in the left subimage in FIG. 4a. In that case, this processor operates in long mode. With this number of bit slices, the processor is well-suited to perform RSA calculations with key lengths of 1024 bits, because, for a calculation with 1024 bits key length, a calculating unit that would also have only 1024 bit slices would not be quite sufficient. In the calculating unit with 1408 bit slices, slightly longer key lengths may also be calculated, but there should always be slightly more bit slices than key bits to be able to compensate certain overflow or underflow situations.
The calculating unit 40 shown in FIG. 4b may be provided with data and/or flow sequences and/or controlled by a controller 41. Furthermore, there is a register configuration means 42 which may configure the registers of the calculating unit, i.e. the five registers in long mode in this embodiment, to ten registers in short mode. Each long mode register of a certain length thus results in two short registers of half the length, respectively, in this embodiment, so that two N registers, two C registers, two Z registers and one CR0 register, one CR2 register, one CR4 register and one CR6 register are created. Still each bit slice has an arithmetic unit, i.e. for example a one-bit full adder, which now, however, has twice the number of registers in short mode in contrast to the situation in FIG. 4c representing the long mode.
If the crypto co-processor with 1408 bits now is to calculate RSA key lengths of, for example, 2048 bits, this is no longer easily possible, because there are not enough bit slices.
It is apparent that, although an increase in key lengths is very desirable from the security point of view, each increase in key lengths causes already existing coprocessors to be no longer readily usable. Thus, always new longer calculating units would have to be developed, which requires development time and costs.
In order to avoid this, methods have been developed with which larger numbers may be processed on smaller calculating units. For example, there are generally methods for doubling a calculating unit in software. Such a method is, for example, the calculation of the modular multiplication using the Chinese Remainder Theorem (CRT), as it is described in section 14.5 on pages 610-613 of “Handbook of Applied Cryptography”, A. Menezes, P. van Oorschot, S. Vanstone, 1996. Generally, a modular exponentiation with a long modulus is split into two modular exponentiations with a short modulus using the Chinese remainder theorem, wherein these results are then combined. In that way, a calculating unit may, so to speak, be doubled “software-wise”.
However, this concept only allows doubling, which is inconvenient for situations in which doubling of the key lengths is not necessarily required, but in which key lengths are to be used that are maybe only 50% larger than the architectural calculating unit length, i.e. the number of bit slices. If such 100% doubling algorithms are used, when perhaps only key lengths larger by 50% are to be processed, the calculating unit is used only with (100+50)%/2=75%. In principle, hardware resources are thus wasted.
In addition to the CRT doubling method, there are also further calculating unit doubling algorithms, such as the Montgomery multiplication, a multiplication with Karatsuba-Offman and subsequent reduction by means of, for example, the Barrett reduction, or the doubling method using the MultModDiv operation, such as it is, for example, discussed in German patent DE 10219158 B4.
Considering, for example, FIG. 4d, a calculating unit for a 1024 bit key length is indicated at 43. Software doubling using, for example, the Chinese remainder theorem or using one of the above further methods, is useful when 2048 bits are required, such as illustrated in block 44 in FIG. 4d. In this way, the whole calculating unit is used, i.e. no unused bit slices remain. However, if a key length with, for example, 1536 bits is to be enough, software doubling using, for example, the Chinese remainder theorem (CRT) will result in 2×768 bits being required. The remaining 2×256 bits would remain unused in this case.
Often there is a demand in the art to perform a reduction of a number with respect to a modulus, i.e. to obtain the remainder of an integer division. Such reductions particularly occur in cryptographic applications based on modular arithmetic, such as in the known asymmetric cryptography techniques, such as the RSA method or also in elliptic curve cryptography. Wherever calculation takes place on a limited body, sooner or later the task has to be solved to get a number that is larger in magnitude than the largest number in this body back into the body, i.e. to reduce it with respect to the modulus associated with the body.
A simple possibility of modular reduction is to subtract the modulus from the number until the result of the subtraction is less than the modulus. In that way, the larger number has been “reduced into” the body again. Such a procedure is normally done so that the number is taken, the modulus is subtracted and that there is then a determination whether the result of this subtraction is already less than the modulus. If this is the case, the modular reduction is already completed. However, if this is not the case, the modulus is again subtracted from the result of the first subtraction, and afterwards there is another determination whether the thus obtained result is again less than the modulus or not. Depending on the result, the reduction is then completed or is continued iteratively.
This procedure is problematic for several reasons. One reason is that the number of steps and the duration of the total calculation significantly depend on the original value of the number. If this number is only slightly above the modulus, a single subtraction process will be sufficient. Thus the result is obtained quickly, while, if the number is significantly larger than the modulus, many such steps are required. Thus it is easy for an attacker to draw conclusions as to the size of the original number based on the duration of the calculation and the current consumption.
What is further problematic is that, if the numbers are larger than the calculating unit width of the processor, the subtraction of a large number, such as the modulus, from an even larger number, such as the number to be reduced or an intermediate result, involves a lot of effort, because the numbers always have to be loaded into the calculating unit in small portions to then perform the individual subtractions piece by piece, which results in significant transfer between the calculating unit and an external memory etc. particularly due to the limited calculating unit length.
On the other hand, particularly in cryptographic applications, the key length is a significant security aspect with respect to so-called brute force attacks, i.e. attacks in which an attacker simply tries every possibility to then “crack” either the key or at least the message after a large number of attempts. Such brute force attacks involve increasing effort the longer the keys and thus also the larger the numbers and/or the modulus in the cryptographic calculation.
However, to keep up with such growing key lengths, a new calculating unit in the form of a new cryptoprocessor or cryptocoprocessor would actually have to be developed and brought onto the market with every new key length. However, this concept is not very flexible and not acceptable for many customers, because they would always have to give back, for example, their old payment cards and exchange them for new payment cards with a cryptoprocessor having a wider word length. However, this concept is exceptionally inflexible and not suitable for a mass market, as it applies to chip cards.
Conventionally, there is thus a lack of a more flexible concept for reducing an input number with respect to a modulus.