The invention relates to a method for protecting the execution of a modular exponentiation against invasive attacks such as Bellcore attacks. The invention also relates to an electronic device, in particular a smart card, implementing such a method.
The invention relates more specifically to the protection of modular exponentiation used in the context of RSA-CRT systems. RSA was introduced in 1977 by Rivest, Shamir and Adleman (see “Rivest, R. L., Shamir, A., Adelman, L. M.: A method for obtaining digital signatures and public-key cryptosystems. Technical Report MIT/LCS/TM-82 (1977)”, which describes RSA in straightforward mode). RSA use is extremely widespread (you typically use RSA any time you connect to a web site securely, any time you use your bank card, etc.).
In the so-called straightforward mode, (N,e) is the RSA public key and (N,d) the RSA private key such that N=p*q, where p and q are large prime integers, gcd((p−1),e)=gcd((q−1),e)=1 and d=e−1 mod((p−1)*(q−1)). The RSA signature of a message m<N is given by S=md mod N.
As the computing power of crypto-enabled architectures increases, RSA key sizes inflate overtime. 2K RSA (RSA using 2048 bit keys) is now a standard functionality. It is a strong constraint on embedded devices which processors typically have little RAM memory and run under a clock frequency of a few megahertz. RSA is more efficient in Chinese Remainder Theorem mode than in straightforward mode. The RSA-CRT domain is composed of an RSA public key (N,e) and an RSA private key (p,q,dp,dq,iq) where N=p*q, p and q are large prime integers, gcd((p−1),e)=gcd((q−1),e)=1, dp=e−1 mod(p−1), dq=e−1 mod(q−1) and iq=q−1 mod p. As it handles data with half the RSA modulus size, RSA with CRT is theoretically about four times faster and is therefore better suited to embedded devices. A naive CRT implementation of RSA signature in CRT mode is described in FIG. 1.
Invasive attacks on a hardware device typically consist in disturbing the expected behavior of the device and making it work abnormally in order to infer sensitive data. Such attacks were introduced in the late nineties. They are a serious concern, because they could lead an attacker to recover key material stored in cryptographic devices such as smart cards, HSMs, etc., which are normally considered secure. This would allow the attacker to impersonate the legitimate user (e.g. perform financial transactions from his bank account, use his phone line, carry out illegal activities in his name, etc.). In the past such attacks were not perceived as critical for personal computers since there are typically plenty of easier ways to crack a computer with pure software means, without the burden of an invasive attack. However, due to growing fraud, and with the emergence of components such as TPMs (trusted platform modules, which specifications are managed by the Trusted Computing Group), this could change. TPMs are meant to introduce secure cryptographic features in possibly all sorts of products (PDAs, printers, cell phones, etc.), are more and more common especially in corporate PCs, but also in all sorts of electronic equipments. So invasive attacks now become a threat to a lot more devices than before, and not only for cryptographic devices or high security computers (e.g. sensitive servers). As the technological response of hardware manufacturers evolves, new hardware countermeasures are being added regularly. However it is widely believed that those can only be effective if combined with efficient software countermeasures. Embedded devices are especially exposed to this category of attacks when the attacker has the hardware fully available in hands. A typical example of invasive attack is the original Bellcore attack which allows an attacker to retrieve the RSA private key given one faulty signature.
The Bellcore attack is a differential fault attack introduced by the Bellcore Institute In 1996. It is described in “Boneh, D., DeMillo, R. A., Lipton, R. J.: On the importance of checking cryptographic protocols for faults. Lecture Notes in Computer Science 1233 (1997) 37-51”. On embedded platforms, this attack is usually considered as “easy” since the attacker has full access to the device. Disturbing the calculation of either Sp=mdp mod p or Sq=mdq mod q (steps illustrated on FIG. 1) can be achieved in ways such as voltage glitches, laser or temperature variation. Once the precise disturbance is obtained the attack succeeds, and allows an attacker to retrieve the RSA prime factors with a single gcd calculation. Indeed, by construction, S=Sq+q*(iq*(Sp−Sq)mod p)=Sp+p*(ip*(Sq−Sp)mod q). Noting S the correct signature and S′ the faulty signature where either Sp or Sq (but not both) is incorrect for the same input message, gcd(S-S′,N) is either q or p. A standard improvement of the Bellcore attack is described in “Joye, M., Lenstra, A. K., Quisquater, J. J.: Chinese remaindering based cryptosystems in the presence of faults. Journal of Cryptology: the journal of the International Association for Cryptologic Research 12(4) (1999) 241-245” and leads to retrieving the factorization of N without the genuine signature by calculating gcd((S′e−m) mod N,N) which is either p or q. Thus, the RSA private elements p and q are recovered and, as a consequence, the whole RSA-CRT private key is recovered.
Fault attacks as introduced by Bellcore are still a major threat toward cryptographic products implementing modular exponentiation, e.g. for the purpose of RSA signatures. When the public exponent is known, it is possible to verify the signature before outputting it, thereby preventing Bellcore attacks. However, most often on embedded devices, the public exponent is unknown, turning resistance to fault attacks into an intricate problem.
Since the discovery of the Bellcore attack, countermeasures have been proposed by the research community. In 1997, Shamir proposed an elegant countermeasure (described in “Shamir, A.: Method and apparatus for protecting public key schemes from timing and fault attacks”, U.S. Pat. No. 5,991,415, November 1999, also presented at the rump session of EUROCRYPT '97) assuming that the private exponent d is known when running an RSA signature generation in CRT mode. In practice, however, this parameter is hardly available. CRT secure implementations of RSA were also proposed:                in 2002 by Aumüller et al. (Aumüller, C., Bier, P., Fischer, W., Hofreiter, P., Seifert, J. P.: Fault attacks on rsa with crt: Concrete results and practical countermeasures. In B. S. Kaliski Jr., c. K., Paar, C., eds.: Cryptographic Hardware and Embedded Systems—CHES 2002. Volume 2523 of Lecture Notes in Computer Science. (2002)260-275)        in 2003 by Blömer et al. (Blömer, J., Otto, M., Seifert, J. P.: A new crt-rsa algorithm secure against bellcore attacks. In: CCS '03: Proceedings of the 10th ACM conference on Computer and communications security, New York, N.Y., USA, ACM (2003) 311-320),        in 2005 by Joye and Ciet (Joye, M., Ciet, M.: Practical fault countermeasures for chinese remaindering based rsa. In Breveglieri, L., Koren, I., eds.: 2nd Workshop on Fault Diagnosis and Tolerance in Cryptography—FDTC 2005. (2005))        in 2005 by Giraud (Giraud, C: Fault resistant rsa implementation. In Breveglieri, L., Koren, I., eds.: 2nd Workshop on Fault Diagnosis and Tolerance in Cryptography—FDTC 2005. (2005) 142-151), and        in 2007 by Kim and Quisquater (Kim, C. H., Quisquater, J. J.: How can we overcome both side channel analysis and fault attacks on rsa-crt? In Breveglieri, L., Gueron, S., Koren, I., Naccache, D., Seifert, J. P., eds.: FDTC. (2007) 21-29)        
These countermeasures will be discussed more in details in the sequel. All these countermeasures have a dramatic impact either on execution time, memory consumption or personalization management constraints.
The elegant countermeasure proposed by Shamir one year after the discovery of the Bellcoreattack, consists in computing S*p=md mod pr and S*q=md mod qr separately and in checking the consistency of S*p and S*q by testing whether S*p=S*q mod r. A more efficient variant suggests to choose r prime and reduce d modulo (p−1)(r−1) and (q−1)(r−1). However, requiring the RSA straightforward-mode private exponent d, while performing an RSA signature generation in CRT mode, is typically unpractical for resource constrained devices since the key material is typically given in CRT format only (as will be seen further). This parameter d is most often not known and it is often unacceptable to personalize d for each constrained device, d could be computed from p, q, dp and dq, but as no key container is typically available to store it, the computation of d would be mandatory for each RSA signature. This would lead to an unreasonable execution time overhead since one would need to invert (p−1) modulo (q−1), as described in particular in Joye, M., Paillier, P.: Gcd-free algorithms for computing modular inverses. In B. S. Kaliski Jr., c. K., Paar, C, eds.: CHES. (2003) 243-253. Moreover, the CRT recombination is not protected at all since injecting a fault in iq during the recombination allows the gcd attack.
Other improvements of Shamir's method which include the protection of the recombination were proposed later. As an example, in above mentioned reference, Aumüller et al proposed in 2002 a careful implementation that also protects the CRT recombination. Aumüller et al use a small prime on which evaluating Euler's totient function is trivial. On the one hand, this countermeasure gives good performances. On the other hand, the selection of a random prime constitutes a real disadvantage. As opposed to Shamir's method, only dp and dq (and not d) are required. The algorithm is fully described in FIG. 2. The proposal uses the efficient variant of the method where the parameter t is prime. Therefore the solution gives good performances. Compared to the naive CRT implementation of RSA, only two extra exponentiations modulo t and a few modular reductions are required. But this solution presents a big disadvantage linked to the way in which the random prime is selected. If it is fixed or picked at random in a fixed table, then if this prime is recovered, it could make new flaws appear. If it is different on each device, this would impact personalization management. If it is generated at random for each signature, this would lead to an unacceptable slowdown.
Other solutions combining generalizations of Shamir's method and infective computation were proposed. The main idea of this combination consists in infecting the signature S whenever a fault is induced, such that the gcd attack is no more feasible on the faulty signature S′, i.e. S′≠S mod p and S′≠S mod q. This concept was introduced in 2001 by Yen, Kim, Lim and Moon (Yen, S. M., Kim, S., Lim, S., Moon, S.: Rsa speedup with residue number system immune against hardware fault cryptanalysis. In: ICISC '01: Proceedings of the 4th International Conference Seoul on Information Security and Cryptology, London, UK, Springer-Verlag (2002) 397-413). Later, Blömer, Otto and Seifert suggested a countermeasure (already mentioned above) based on infective computation in 2003. Unfortunately, as for Shamir's original method, it requires the availability of d. Moreover, some parameters t1 and t2 required by the countermeasure have to satisfy quite strong properties: amongst the required properties, it is needed that: gcd(t1,t2)=gcd(d,phi(t1))=gcd(d,phi(t2))=1, where phi represents the Euler's totient function, t1 and t2 should normally be generated one time with the RSA key and the same values used throughout the lifetime of the key, but t1 and t2 typically cannot be stored in a context where there are strong personalization constraints. Therefore the generation of t1 and t2 at each signature is not negligible. Compared to Aumüller et al.'s countermeasure, the BOS algorithm requires the generation of t1 and t2, two evaluations of the totient function phi on t1 and t2 and two inversions. This constitutes a real disadvantage in terms of simplicity and execution time.
Joye and Ciet also set out an elegant countermeasure based on infective computation (C.F. reference above). Their generalization of Shamir's method is more efficient than BOS since, compared to Aumüller et al.'s countermeasure, one only needs to compute phi(t1) and phi(t2) for two random numbers t1 and t2. However, evaluations are not negligible as they imply a full factorization of t1 and t2. As a consequence, Joye and Ciet's countermeasure is not satisfactory in terms of execution time.
In 2007, Kim and Quisquater proposed a CRT implementation of RSA defeating fault attacks and all known side-channel attacks (see reference above), based on combination of Shamir's method and infective computation too. However, their proposed scheme requires either one inversion modulo N, or to update and store three unusually formatted parameters of size |N|, at each signature. Unfortunately, no key container for such parameters is typically available in non-volatile memory of typical resource constrained devices and therefore, the parameter must typically be computed every time, and this solution becomes hardly acceptable in terms of execution time.
In 2005, Giraud proposed an efficient way to protect RSA with CRT against fault attacks (see reference above). His countermeasure is based on the properties of the Montgomery-ladder exponentiation algorithm described in particular in Joye, M., Yen, S.: The montgomery powering ladder. In B. S. Kaliski Jr., c. K., Paar, C., eds.: Cryptographic Hardware and Embedded Systems—CHES 2002. Volume 2523 of Lecture Notes in Computer Science. (2002) 291-302. Using this exponentiation algorithm, Giraud suggests to compute successively (mdp,mdp−1) and (mdq,mdq−1). The Montgomery-Ladder algorithm infects both results whenever a fault is induced. The two recombined values S and S′=mdq−1+q·(iq·(mdp−1−mdq−1)mod p) are computed and the final verification S=mS′ is made. This solution is also SPA-safe. Unfortunately, the memory consumption is clearly prohibitive since it requires the storage of m, Sp, Sq, S′p and S′q in RAM during the calculation of S. For large RSA key sizes, this countermeasure seems hardly feasible in portable devices with limited resources.
Over the past few years, several techniques for secure implementations have therefore been published, all of which suffering from inadequacy with the constraints faced by certain embedded platforms.
Indeed, in constrained embedded architectures, one typically seeks to simultaneously optimize at least the following:                Execution time        
The secure RSA-CRT signature computation has to be performed in reasonable time. Without giving concrete bounds, the time overhead added by the countermeasure should remain as small as possible compared to the whole RSA signature calculation. This is quite important in particular for micro-controllers running under a clock frequency of only a few megahertz.                Memory consumption        
Countermeasures require extra RAM memory buffers to store security parameters. 2K RSA is generally supported as a standard functionality and it is preferred that the whole memory consumption remains comprised between 1 Kb and 2 Kb (kilo bytes) for current devices, especially the less powerful ones (e.g. low end smart cards).                Personalization management        
For constrained device such as smart cards which are deployed by millions, and where each smart card is different, personalization is the task which consists in loading the relevant (and typically different) information in each smart card (card holder name, bank account numbers, specific data, etc.). In many fields, some personalization aspects are standardized, either by official bodies or de facto. E.G. The file system of a SIM card used in mobile telephony is highly standardized in order to guarantee an acceptable level of interoperability (almost any SIM card should word in almost any cell phone). Such standardization often concern among other things, the way in which cryptographic material is stored in the constrained device. Availability of input key parameters is therefore a very strict constraint. Quite often, for RSA operations, only the input message m, as well as the CRT decomposition comprising key elements p, q, dp, dq, iq are known while performing an RSA signature and no extra variable parameter can be stored in non-volatile memory if one wishes to remain compliant with standards. This constraint also stems from mass-production requirements where the personalization of unusually formatted keys in the device is costly and no customizable key container is therefore typically available in non volatile memory (e.g. EEPROM or Flash) to store anything different from the classical RSA-CRT keysets, an example of which is described in “Sun Microsystems Inc.: Javacard 2.2.2—application programming interface. Technical report (2006)”. Other types of key sets can be available in non Java environment (e.g. proprietary OS, .NET OS, etc.), but they typically have the same kind of constraints.                Code Size        
On micro-controllers that have little storage space for executable code (typically ROM, or flash), the code size is a great concern. The extra code size added by a countermeasure should remain as small as possible compared to the whole code size of the cryptographic operation (typically a signature) protected by the counter measure.
This shows that devising a CRT implementation of RSA that thwarts the Bellcore attack and meets the strong requirements of embedded systems remains a hard problem, which specialists have been trying to solve for more than ten years without success.
It is therefore an object of the invention to find a countermeasure allowing to securely compute modular exponentiations (and in particular RSA signatures), which is adapted to resource constrained device. Such countermeasure is of course very well adapted to more powerful devices as well, since even when you have plenty of resources, you typically don't want to waste them uselessly.