The present invention relates to calculating units for calculating a modular multiplication whose processable word width is less than a word width of the input number or the modulus, wherein such requirements particularly occur in cryptographic applications.
The modular multiplication is a central operation used in modular exponentiation, such as it is usually used in cryptography. For example, as shown in FIG. 2a, a key pair is generated in public key cryptography, i.e. in asymmetric cryptography, such as in the RSA method. The key pair consists of a public key e and a private key d. The private key is only known to one entity. The public key serves this entity, but is provided to another entity which wants to send, for example, encrypted data to the one entity to which the private key belongs. As shown in FIG. 2a, an encryption of an unencrypted message M to an encrypted message C is done by calculating a so-called modular exponentiation, in which the message is raised to a higher power with the public key, to then perform a modular reduction with respect to the modulus N, which is also known publicly. For the decryption, the same operation is performed, but now with the private key as exponent, so that the one entity to which the private key belongs and by which the public key was originally distributed to the other entity, again obtains the plain text message M.
These public key methods may also be used as signature/verification methods. An entity generates a digital signature by encrypting the message M to be signed with the private key of this entity to generate the signature S, such as it is also illustrated in FIG. 2a. The verification is then done by the verifying entity subjecting the signature to modular exponentiation with the public key e of the signing entity to then obtain a plain text message M that may be compared to the plain text message M to which the signature is assigned. If the plain text message obtained in the verification matches the plain text message to which the signature is assigned, it may be assumed that the signed document is authentic.
As mentioned above, a cryptographic calculation including modular exponentiation, such as illustrated in FIG. 2b, is split into several modular multiplications. For example, it is usually preferred to calculate a modular exponentiation by applying modular multiplications consecutively. In particular, due to the increased security requirements for the RSA algorithm, there is an interest to execute a modular multiplication with a width of 2048 bits, i.e. with key lengths and/or modulus lengths of 2048 bits.
Generally in modular multiplication as part of a cryptographic calculation, both the multiplier A and the multiplicand B and the modulus N represent parameters of the cryptographic calculation, because the final results, such as plain text message, encrypted message, signature, etc. depend on these parameters.
As already mentioned, there is an interest to steadily increase the key lengths of public key cryptography, because this allows to still prevent so-called brute force attacks with increasingly fast processors. For example, the effort of a brute force attack is correlated with the key length, so that increasingly long keys also require increasingly more complex brute force attacks which, with currently available computers, take so much time that a cryptographic algorithm may be considered to be safe. However, what is problematic with increasingly larger key lengths is that the key length that a crypto co-processor in a chip card or a computer (for example in a TPM module) has is limited by the long number calculating unit included in this crypto co-processor. Such a long number calculating unit is shown, for example, in FIG. 4c, where a so-called bit-slice structure of a long number calculating unit is illustrated.
In the embodiment shown in FIG. 4c, each bit slice includes an arithmetic unit, which may, for example, be a one-bit full adder, which may receive a carry from a lower bit slice and which may output a carry to a higher bit slice. Furthermore, at least one register is associated with such a bit slice. However, it is preferred to associate a certain number of registers, for example two or, even better, for example five registers. In a currently existing crypto co-processor with a bit slice number of 1408 slices, a bit slice includes five registers, i.e. register Z, register C, register N, register CR0 and register CR4, as indicated in the left subimage in FIG. 4a. In that case, this processor operates in long mode. With this number of bit slices, the processor is well-suited to perform RSA calculations with key lengths of 1024 bits, because, for a calculation with 1024 bits key length, a calculating unit that would also have only 1024 bit slices would not be quite sufficient. In the calculating unit with 1408 bit slices, slightly longer key lengths may also be calculated, but there should always be slightly more bit slices than key bits to be able to compensate certain overflow or underflow situations.
The calculating unit 40 shown in FIG. 4b may be provided with data and/or flow sequences and/or controlled by a controller 41. Furthermore, there is a register configuration means 42 which may configure the registers of the calculating unit, i.e. the five registers in long mode in this embodiment, to ten registers in short mode. Each long mode register of a certain length thus results in two short registers of half the length, respectively, in this embodiment, so that two N registers, two C registers, two Z registers and one CR0 register, one CR2 register, one CR4 register and one CR6 register are created. Still each bit slice has an arithmetic unit, i.e. for example a one-bit full adder, which now, however, has twice the number of registers in short mode in contrast to the situation in FIG. 4c representing the long mode.
If the crypto co-processor with 1408 bits now is to calculate RSA key lengths of, for example, 2048 bits, this is no longer easily possible, because there are not enough bit slices.
It is apparent that, although an increase in key lengths is very desirable from the security point of view, each increase in key lengths causes already existing coprocessors to be no longer readily usable. Thus, always new longer calculating units would have to be developed, which requires development time and costs.
In order to avoid this, methods have been developed with which larger numbers may be processed on smaller calculating units. For example, there are generally methods for doubling a calculating unit in software. Such a method is, for example, the calculation of the modular multiplication using the Chinese Remainder Theorem (CRT), as it is described in section 14.5 on pages 610-613 of “Handbook of Applied Cryptography”, A. Menezes, P. van Oorschot, S. Vanstone, 1996. Generally, a modular exponentiation with a long modulus is split into two modular exponentiations with a short modulus using the Chinese remainder theorem, wherein these results are then combined. In that way, a calculating unit may, so to speak, be doubled “software-wise”.
However, this concept only allows doubling, which is inconvenient for situations in which doubling of the key lengths is not necessarily required, but in which key lengths are to be used that are maybe only 50% larger than the architectural calculating unit length, i.e. the number of bit slices. If such 100% doubling algorithms are used, when perhaps only key lengths larger by 50% are to be processed, the calculating unit is used only with (100+50) %/2=75%. In principle, hardware resources are thus wasted.
In addition to the CRT doubling method, there are also further calculating unit doubling algorithms, such as the Montgomery multiplication, a multiplication with Karatsuba-Offman and subsequent reduction by means of, for example, the Barrett reduction, or the doubling method using the MultModDiv operation, such as it is, for example, discussed in German patent DE 10219158 B4.
Considering, for example, FIG. 4d, a calculating unit for a 1024 bit key length is indicated at 43. Software doubling using, for example, the Chinese remainder theorem or using one of the above further methods, is useful when 2048 bits are required, such as illustrated in block 44 in FIG. 4d. In this way, the whole calculating unit is used, i.e. no unused bit slices remain. However, if a key length with, for example, 1536 bits is to be enough, software doubling using, for example, the Chinese remainder theorem (CRT) will result in 2×768 bits being required. The remaining 2×256 bits would remain unused in this case.
Conventionally, there is thus a lack of an alternative calculating unit extension concept by which more flexible key lengths and thus a more flexible calculating unit utilization may be achieved.