1. Introduction
The simplest attack on a given cryptosystem is to exhaustively search for the key. There are many variants of this attack (known ciphertext, known cleartext, chosen cleartext, etc.), but they are all based on a procedure which tries the keys one by one until the correct key is encountered. If the key consists of n random bits, the expected running time of this procedure is 2 (n-1). This attack can be easily foiled by using a large enough n (e.g., n&gt;100).
To attack cryptosystems with large keys, cryptanalysts try to find mathematical or statistical weaknesses which reduce the size of the search space (preferably to 1). Although many techniques and results are classified for national security reasons, it is safe to assume that it is increasingly difficult to find such weaknesses in modern schemes designed by experienced cryptographers and implemented on high speed microprocessors.
To successfully attack strong cryptosystems, the cryptanalyst must use indirect techniques. This is best done when the cryptanalyst is either in close physical proximity to the cryptographic device, or has it under his complete control. The cryptographic device is assumed to be a black box which contains a known algorithm and an unknown key. The cryptanalyst cannot open this box and read its key, but he can observe its behavior under various circumstances.
One of the best known examples of such an indirect attack is TEMPEST, which tries to deduce the key by analyzing electromagnetic radiation emanating from the black box during the computation of the ciphertext. Techniques for applying and preventing such attacks have been extensively studied for more than 50 years, and by now this is a well understood problem.
Two powerful indirect attacks were discovered and published recently: In December 1995, P. Kocher, "Cryptanalysis of Diffie-Hellman, RSA, DSS, and Other Systems Using Timing Attacks," technical report, 12/7/95, described a timing attack, and in September 1996, D. Boneh, R. A. Demillo and R. J. Lipton, "On the Importance of Checking Computations," technical report, 9/25/96 (an extended version appears in the Proceedings of Eurocrypt 97, May 1997) described a fault attack. Both attacks were originally designed for and are most successful against public key schemes based on number theoretic principles, such as RSA, but they were later extended to classical cryptosystems as well (e.g., by E. Biham and A. Shamir, "A New Cryptanalytic Attack on DES," technical report, 10/18/96. An extended version appears in the Proceedings of Crypto 97, August 1997).
Such attacks are particularly useful when the scheme is implemented on a smart card, which is distributed by a bank, computer network, cellular phone operator, or pay-TV broadcaster to its customers. Hackers do not usually have the financial and technical resources required to read the contents of the key registers inside the smart card, but they have complete control on the input/output, clock, reset, and power connections of the smart card. They can carefully measure the duration of the various operations, how much power they consume, what happens when the computation is interrupted or carried out under abnormal operating conditions, etc. Since the tests are carried out in the privacy of the customer's home, the card manufacturer cannot prevent them or even learn about their existence.
2. Timing Attacks
Timing attacks are based on the assumption that some of the basic operations carried out during the cryptographic calculation require a non-constant amount of time which depends on the actual values being operated upon. This implies that some information about these unknown intermediate values leaks out by measuring the length of the cryptographic computation. If these intermediate values are computed from known cleartext bits and unknown key bits by a known cryptographic algorithm, the attacker can try to use the leaked intermediate values to deduce the key.
The main difficulty in carrying out this attack is that the attacker knows only the total amount of time required to carry out the cryptographic computation, but not the timing of individual computational steps. Kocher's main contribution is in developing an efficient technique for handling this difficulty in many cases of practical interest.
For the sake of concreteness, we describe Kocher's attack on the RSA cryptosystem. The black box is assumed to contain a publicly known modulus n and a secret exponent d. Given an input number x, the box performs the modular exponentiation x d (mod n) by using the standard square-and-multiply technique. In this description, the symbol " " is exponentiation and the symbol ".sub.-- " is a subscript. The result (which can be the decryption of the ciphertext x, the signature of the message x, or the response to a random identification challenge x, depending on the application) is sent out as soon as it is produced, and thus the attacker can measure the total number of clock cycles taken by all the modular multiplications.
Standard implementations of modular multiplication require a non-constant amount of time, since they skip multiplication steps involving leading zeroes, and reduction steps when the result is smaller than the modulus. The attacker chooses a large number of random inputs x, and measures the actual timing distribution T.sub.-- 0 of the modular exponentiation operation carried out by the black box. He then measures for each x (by computer simulation, using his knowledge of how the scheme is implemented) the precise timing of an initial square-only operation, and separately, the precise timing of an initial square-and-multiply operation. The result is a pair of timing distributions T.sub.-- 1 and T.sub.-- 2, which are not identical. All the cryptographic computations carried out in the black box use the same exponent d, and its first bit determines which one of the two computed distributions T.sub.-- 1 and T.sub.-- 2 is the initial part of the experimentally computed T.sub.-- 0. The timing of the remaining steps of the computations can be assumed to be a random variable R, which is normally distributed and uncorrelated with either T.sub.-- 1 or T.sub.-- 2. Since T.sub.-- 0 is either T.sub.-- 1+R or T.sub.-- 2+R, the attacker can decide which case is more likely by finding which one of the two distributions T.sub.-- 0-T.sub.-- 1 and T.sub.-- 0-T.sub.-- 2 has a lower variance.
After finding the first bit of the secret exponent d, the attacker knows the actual inputs to the second computational step, and thus he can apply the same technique (with properly modified experimental and simulated timing distributions T'.sub.-- 0, T'.sub.-- 1, and T'.sub.-- 2) to find the second bit of d. By repeating this procedure about 1000 times, he can compute all the bits of d, and thus break the RSA scheme.
A similar timing attack can be applied to any cryptographic scheme in which the black box raises all its inputs x.sub.-- 1,x.sub.-- 2, . . . to the same secret power d modulo the same known n (which can be either a prime or a composite number). For example, in one of the variants of the Diffie-Hellman key distribution scheme, all the users agree on a prime modulus n and on a generator g of the multiplicative group Z *.sub.-- n.
Each user chooses a random secret exponent d, and computes y=g d (mod n) as his public key. To establish a common secret key with another user, the first user sends out his public key y=g d (mod n), and receives a similarly computed public key x=g e (mod n) from the other user. Their common cryptographic key is z=g (d*e) (mod n) which the first user computes by evaluating x d (mod n). When the first user communicates with several parties, he raises several known values x.sub.-- 1,x.sub.-- 2, . . . to the same secret power d modulo the same known modulus n. By measuring the timing of sufficiently many such computations, the attacker can determine d and thus find all the cryptographic keys z.sub.-- 1,z.sub.-- 2, . . . employed by that user.
The timing attack has to be modified if the computation of x d (mod n) for a composite modulus n=p*q is carried out by computing x d (mod p), x d (mod q), and combining the results by the Chinese Remainder Theorem (CRT). This is a common way of making the computation about 4 times faster when the factorization of n is known. The problem for the attacker is that he does not know the secret factors p and q of the public modulus n, and thus cannot simulate the timing distributions T.sub.-- 1 and T.sub.-- 2. Kocher's solution is to concentrate on the first step of the CRT computation, in which the input x is reduced modulo p. If x is smaller than p, no modular reduction is required, and thus the computation is considerably faster than when x is larger than or equal to p. The attacker thus presents to the black box a large number of inputs x which are very close to each other, and uses the average time of such computations to decide whether these x's are above or below p. A decision procedure for this question can be repeatedly used to find the precise value of p by binary search.
Shortly after the discovery of this attack, researchers tried to develop implementations which are immune to it. The simplest idea is to make sure that all the cryptographic operations take exactly the same amount of time, regardless of the values of the cleartexts and keys. However, achieving this is surprisingly difficult for the following reasons:
(a) In many cases, the implementor wants to run the same algorithm in software on different (and perhaps unknown) machines. An implementation which is constant time on one microprocessor may be variable time on another microprocessor or even on an enhanced version of the same microprocessor. PA0 (b) On a multitasking machine the running time may depend on the amount of free memory, the cache hit rate, the number of external interrupts, etc. This can change a constant time implementation under one set of circumstances into a variable time implementation under another set of circumstances. PA0 (c) If the implementor tries to use a real time clock to force all the computations to take the same amount of time, he must slow all of them down to their worst cases. Since he cannot use any input-dependent optimization technique, the implementation is likely to be unacceptably slow.
The best protective technique proposed so far against Kocher's timing attack on modular exponentiation is to replace each input x by a modified version y=x*r (mod n) where r is a secret random number between 1 and n-1. To compute x d (mod n), the program computes y d (mod n) and r d (mod n), and then uses the multiplicative property of modular exponentiation to compute x d (mod n) as y d/r d (mod n). Since both y and r are unknown, the attacker cannot simulate these computations in order to find the successive bits of d in the non-CRT computation, and cannot perform binary search in the CRT version of the computation. Unfortunately, this randomization technique doubles the expected running time of the computation.
3. Fault Attacks
Fault attacks try to introduce errors into the cryptographic computation, and to identify the key by analyzing the mathematical and statistical properties of the erroneously computed results. Among the many techniques suggested so far for introducing such errors are the use of ionizing radiation, unusual operating temperatures, power and clock glitches, and laser-based chip microsurgery. Some of the attacks are differential (i.e., they carry out both a correct and an erroneous computation with the same input and analyze their differences), while other attacks just use the erroneous results.
The original fault attack on public key cryptosystems was described in Boneh, Demillo and Lipton, and required several cryptographic computations. We now describe an improved version of this attack, due to Aijen Lenstra, which requires a single faulty computation. We assume that the black box uses the RSA scheme to sign a given message x. The computation of x d (mod n) is carried out with the CRT method by first reducing x modulo p and q to get x.sub.-- 1 and x.sub.-- 2, then computing y.sub.-- 1=x.sub.-- 1 d (mod p) and y.sub.-- 2=x.sub.-- 2 d (mod q), and finally combining y.sub.-- 1 and y.sub.-- 2 to get the signature y (mod n) with the CRT method. We assume that a single error is introduced at a random time during this computation by applying mild physical stress to the black box. Without loss of generality, we can assume that the error was introduced during the computation of x.sub.-- 1 d (mod p), and thus instead of getting the correct y.sub.-- 1, the box computed an erroneous y'.sub.-- 1. When y'.sub.-- 1 and y.sub.-- 2 are combined by the CRT method, the box computes an incorrect signature y' which is provided to the attacker.
The main observation is that the attacker knows the signature verification exponent e, for which y e=x (mod n). Due to the error, y' e-x is non-zero mod p, but zero mod q, and thus it is a multiple of q which is not a multiple of n. The attacker can thus factor n by computing the greatest common divisor of y' e-x (mod n) and n, which is an easy computation.
To protect cryptographic schemes against fault attacks, Boneh, Demillo and Lipton recommend that each computation should be carried out twice (preferably by different algorithms). If any discrepancy is found between the two results, the box should not output anything. This provides strong protection from random faults (which are unlikely to affect the two computations in an identical way), but it slows down the computation by a factor of 2. Such a slowdown is particularly noticeable in smart card implementations of public key schemes, which are quite slow to begin with.