Cryptographic operations are used for a variety of processes such as data encryption and authentication. In a typical symmetric cryptographic process, a secret key is known to two or more participants, who use it to secure their communications. In systems using asymmetric or public-key cryptography, one party typically performs operations using a secret key, e.g., the so-called private key, while the other performs complementary operations using only non-secret parameters, e.g., the so-called public key. In both, symmetric and asymmetric, cryptosystems, the secret parameters must be kept confidential, since an attacker who compromises a key can decrypt communications, forge signatures, perform unauthorized transactions, impersonate users, or cause other problems.
Methods for securely managing cryptographic keys using physically secure, shielded rooms are known and are widely used. However, the known methods for protecting keys in cryptographic devices are often inadequate for many applications, such as those requiring a high degree of tamper resistance.
Attacks such as reverse-engineering of a ROM using microscopes, timing attack cryptanalysis, as described for example by P. Kocher in “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems,” Advances in Cryptology—CRYPTO '96, Springer-Verlag, pages 104-113, and error analysis as described for example by E. Biham and A. Shamir in “Differential Fault Analysis of Secret Key Cryptosystems,” Advances in Cryptology—CRYPTO '97, Springer-Verlag, 1997, pages 513-525, are known for analyzing cryptosystems.
Ciphers and algorithms believed to be cryptographically secure are known. For example, protocols using triple DES. i.e. a cipher constructed using three applications of the Data Encryption Standard using different keys, can resist cryptanalytic attacks, provided that attackers only have access to the standard inputs to and outputs from the protocol. However, even a product using an extremely strong cipher such as triple DES can be insecure if the keys are not managed securely. Smartcards commonly encode their internal data using a cryptographic technique such as the Data Encryption Standard (DES). A detailed description of DES is presented by Bruce Schneier in Applied Cryptography, 2n' edition, ISBN 0 11709-91 1996, John Wiley & Sons, at pp. 265. The Federal Information Processing Standard (FIPS) description of DES is contained in FIPS publication 46-3, available on the Internet at http://csrc.nist.gov/fips/.
DES is a block cipher method using a 64 bit key (of which only 56 bits are actually used), which is very fast and has been widely adopted. Though DES can be cracked by a brute-force attack, i.e. simply testing all possible keys, triple DES is still considered very secure. For the purposes of the examples described hereinafter, it is sufficient to know that the DES algorithm performs 16 rounds which effect lookups to eight separate translation tables called S-boxes. Other similar cryptographic techniques are also known in the art, including. triple DES, IDEA, SEAL, and RC4; public key (asymmetric) encryption and decryption using RSA and E1 Gamal; digital signatures using DSA, E1 Gamal, and RSA; and Diffie-Hellman key agreement protocols. Despite the theoretical strength and complexity of these cryptographic systems, power analysis techniques have been developed which allow these keys to be cracked much more quickly.
Information on DES and other cryptographic algorithms can also be found in the Handbook of Applied Cryptography by Menezes et al. (CRC Press, Inc., 1997). The Data Encryption Standard (DES) is widely used as a cryptographic primitive for data encryption, pseudo-random number generation, MACs, and other cryptographic operations. The basic DES encryption algorithm uses a 56-bit key to transform a 64-bit plaintext block into a 64-bit ciphertext block. The corresponding decryption operation uses the same key to transform ciphertext blocks into their corresponding plaintexts.
To obtain a secret key from a cryptographic system, also referred to as cryptosystem, an attacker can exploit the fact that such a system leaks information. The attacker can try to gather data by observing a series of operations, perform statistical analysis on the observations, and use the results to determine the key. In a common situation, an attacker monitors a physical property, such as power consumption, of a secure token as it performs a cryptographic operation. The attacker collects a small amount of data related to the key each time the token is observed performing a cryptographic operation involving the key. The attacker increases the amount of information known about the key by collecting and statistically correlating or combining data from multiple observations of the token as it performs operations involving the key. In the case of a cryptosystem which is leaking information, such observations may contain signal information, i.e., information correlated usefully to the key. However, such observations also contain noise, i.e., information and error that hinder or are irrelevant to determination of the key. The quality of the information gained from these observations is characterized by a “signal to noise” or S/N ratio, which is a measure of the magnitude of the signal compared to the amount of noise. The number of operations that the attacker must analyze to recover the key depends on the measurement and analysis techniques, but is generally inversely proportional to the square of the S/N ratio. The constant of proportionality also depends upon the amount of confidence the attacker requires. For example, a relatively low confidence level may be acceptable to an attacker willing to do an optimized brute force search using statistical information about key bit values. Decreasing the signal by a factor of 15 and increasing the amount of measurement noise by a factor of 20 will reduce the signal-to-noise ratio by a factor of 300. This will generally mean that an attacker will require roughly 90,000 times as many observations to extract the same amount of information about the key. An attack requiring 1,000 observations to recover a key before the S/N reduction would now require on the order of 90 million observations to gain the same level of confidence in the recovered key.
Examples of DPA being used to extract a DES key are presented by Paul Kocher, Joshua Jaffe, and Benjamin June, 1998, “Introduction to differential power analysis and related attacks”, available at http://www.cryptography.com/dpa/technical; or by Thomas S. Messerges, Ezzy A. Dabbish, and Robert H. Sloan, 1999, in “Investigations of power analysis attacks on smart cards”, Usenix '99; see http://www.eecs.edu/-tmesserg/usenix99/html/paper.html; and also by Louis Goubin and Jacques Patarin, 1999, in “DES and differential power analysis: the “duplication” method”, Proceedings of CHES '99, Springer Lecture Notes in Computer Science, vol. 1717 (August 1999); http://www.cryptosoft.com/htmi/secpub.htm#goubin.
A principal objective is to make a cryptosystem that is difficult to attack successfully, for example by increasing the number of observations required by an attacker to compromise a key. By reducing the available signal size and/or increasing the amount of error, noise, and uncertainty in attackers' measurements, a system designer can make the so-called work function. i.e. the effort required to break a system, larger. Ideally, the number of samples required to gain any significant amount of useful key information should exceed the maximum number of transactions that can be performed using the key, exceed the number of transactions that can be performed by the device, e.g., before the key expires, or else be so large that monitoring attacks are of comparable or greater difficulty than brute force and other known attacks. For example, if attackers are limited to measurements with a signal-to-noise ratio across an entire transaction well below 1/1000 in a system programmed to self-destruct after one million operations, which is well beyond the expected operational life of most smartcards, the attacker would be unable to collect enough measurements to compromise the device. For physically large systems, effective physical shielding, physical isolation, and careful filtering of inputs and outputs can protect cryptographic devices from external monitoring attacks that involve analyzing power consumption, electromagnetic radiation, electrical activity within the device, etc. as well as protecting against physical attacks. However, these techniques are difficult to apply in constrained engineering environments. For example, physical constraints such as size and weight, cost requirements, and the need to conserve power can prevent the use of the known shielding techniques.
Keeping electronic information hidden from hostile parties is desirable in many environments, whether personal, business, government, or military. “Sealed platforms”, which are special kinds of electronic hardware devices, have been developed to satisfy this need. The term “platform” generally refers to a hardware/software environment capable of supporting computation including the execution of software programs. A “sealed” platform refers to a platform purposely built to frustrate reverse-engineering.
In contrast to traditional credit and debit cards which store a small amount of information on a magnetic strip, the sealed platforms such as smartcards, may store and process a significantly larger quantity of data using microprocessors, random access memory (RAM), and read only memory (ROM). The sealed platforms are typically secured using cryptographic technology which is intended to maintain and manipulate secret parameters in open environments without revealing their values. Compromise of a secret key used to compute a digital signature could, for example, allow an attacker to forge the owner's digital signature and execute fraudulent transactions.
A sealed platform is intended to perform its function while protecting information and algorithms, such as performing digital signatures as part of a challenge-response protocol, authenticating commands or requests, and encrypting or decrypting arbitrary data. A smartcard used in a stored value system may, for example, digitally sign or compute parameters such as the smart card's serial number, balance, expiration date, transaction counter, currency, and transaction amount as part of a value transfer.
Power analysis is the process of gathering information about the data and algorithms embodied on a platform by means of the “power signature” of the platform. The “power signature” of a platform is its power consumption profile measured over time, while executing the software stored on that platform. The power consumed by a microprocessor, micro-controller or similar electronic device changes with the state of the electronic components in the device. Such devices generally represent data in terms of binary 1s and 0s, which are represented in the electronic devices as corresponding high or low voltage levels. For example, a value of 1 may be represented by +5 volts and a value of 0 by 0 volts.
Hence, the amount of power that a sealed platform consumes may be correlated with the number of binary 1s in a data word, at a given moment in time. It follows that the amount of current drawn by, and the electromagnetic radiation emanated from a sealed platform, may be correlated to the secrets being manipulated within it. Such signals can be measured and analyzed by attackers to recover secret keys. State transitions are also a major influence on the power consumption of a device performing a computation. As the value of a bit changes, transistor switches associated with that bit change state. Therefore, there is an increase in the amount of power consumed when the system is in transition. Attackers can non-invasively extract secret keys using external measurement and analysis of a device's power consumption, electromagnetic radiation, or processor cycle timing during performance of cryptographic operations. The current and voltage being supplied to the smartcard may be monitored while it is executing.
In simple power analysis (SPA), the power signature for the execution of a given algorithm is used to determine information about the algorithm and its data. Generally, power data is gathered from many executions and averaged at each point in time in the profile.
For example, if SPA is used to attack a DES key space, and the attacker has access to the specific code, but not the particular DES key, a particular series of points in the power signature may indicate the number of 1 and 0s in each 8-bit byte of the DES key. This reduces the space of possible keys for an exhaustive all-possible-keys attack from 256 possible keys to 238 possible keys, if parity bits are stored for each byte of the key, making search time among possible keys about 218 times shorter.
Differential power analysis (DPA) is a form of power analysis in which information is extracted by means of gathering multiple power signatures and analyzing the differences between them. For certain kinds of data and algorithms, exhibiting repetitious behavior, it is an extraordinarily effective method for penetrating secrets stored on sealed platforms. It can reveal information about the data resulting from computations, fetches from memory stores to memory, the data addresses in the memory of the sealed platform from which data are fetched or to which data are stored during execution, and the code addresses from which instructions are fetched during the execution of algorithms on the sealed-platform. These capabilities render protection of sealed platforms against DPA attack both very important to security and very difficult to achieve on inexpensive sealed platforms. While SPA attacks use primarily visual inspection to identify relevant power fluctuations, DPA attacks use statistical analysis and error correction techniques to extract information correlated to secret keys. Hence, DPA is a much more powerful attack than SPA, and is much more difficult to prevent. One use for DPA is to extract cryptographic keys for encryptions or decryptions performed on a sealed platform. For the Data Encryption Standard (DES), DPA has proved extremely effective; low-cost smart cards performing DES have proven, in recent experience, to be highly vulnerable to DPA. Any form of encryption or decryption which is similar to DES would necessarily have similar vulnerabilities when incarnated on low-cost smart cards or similar sealed platforms.
Implementation of a DPA attack to find a DES key involves two phases, namely data collection followed by data analysis. Data collection for DPA may be performed by sampling a device's power consumption during cryptographic operations as a function of time or number of clock cycles. For DPA, a number of cryptographic operations using the target key are observed. To perform such an attack on a smart card, one processes a large number (a thousand or more) DES encryptions (or decryptions) on distinct plaintexts (or ciphertexts), recording the power profile, the input, chosen at random by the attacker; and the output, computed by the smartcard as the encrypted of decrypted value with the hidden key for each.
Each power profile is referred to as a sample. In each round of DES, the output of a given S-box is dependent on both the data to be encrypted (or decrypted) and the key. Since the attacker knows the input text, he guesses what the value of the key is, that was used to generate a particular power signature sample, so he can determine whether a particular output bit of a given S-box is 1 or 0 for the particular data used in the sample. Each standard S-box has a 6-bit input and a 4-bit output. Typically, this analysis begins in round 1 or 16 since those are the ones where the attacker knows either the exact inputs (for round 1) or outputs (for round 16) for the respective S-box. The attacker does not know the key, but because the DES algorithm only performs one S-box lookup at a time, it is only necessary to guess the six bits of the secret key that are relevant to the S-box being observed and corresponding to the power consumption at that time. As only 6-bits are relevant, it is only necessary to test 26=64 possible sequences of values for a given 6-bit portion of the 56-bit secret key. For each guess of the values of these six bits, one divides the samples into two groups: those in which the targeted output bit, that is, one of the four output bits from a targeted S-box which is chosen as a target in the first round of the attack, is a 1 if the attacker's guess of the six key bits is correct (the 1-group), and those in which it is a 0 if the attacker's guess of the six key bits is correct (the 0-group). The power samples in each group are then averaged. On average, modulo minor asymmetries in DES, those portions of the averaged power profiles which are affected only by bits other than the particular output bit mentioned above, should be similar, since on average, in both groups, they should be 1 for about half of the samples in each group, and 0 for about half of the samples in each group. However, those portions of the averaged power profiles which are affected by the above-mentioned output bit should show a distinct difference between the 1-group and the 0-group. The presence of such a difference, or multiple such differences, indicates that the guessed value of the six key bits was correct. Its absence, or the absence of such differences, shows that the guessed value of the six key bits was incorrect. This process of guessing at the value of the secret key, dividing the power signature samples into those which will yield a 1-output and those which will yield a 0-output (the 1-group and 0-group respectively), averaging the profiles, and seeking the above-mentioned distinct difference, is repeated until a guess is shown to be correct. One then has six bits of the key. The above guessing procedure is repeated for the other seven S-boxes. When all S-boxes have been treated in this way, one has obtained 48 out of the 56 key bits, leaving only eight bits undetermined. This means one need only search a remaining key space of 28=256 possible keys to find the balance of the correct secret key. It becomes apparent how little information the attacker needs to employ such an attack. The attacker does not have to know the specific code used to implement DES, the memory layout used for storing the S-boxes, where in the power profile the distinct difference or difference, if any, is expected to appear for a correct guess; how many such distinct differences are expected to appear in the power profile for a correct guess; or whether the chosen S-box output bits are normal or complemented as flipping 1s and 0s will produce the same kind of distinct difference. DPA is only dependent on whether such a difference exists, not in the sign, i.e. + or −, of any given difference.
All an attacker really needs to know in order to mount a successful attack is that it is DES which is being attacked, and that the implementation of DES, at some point, employs a bit which corresponds to a specific output of the S-box, in such away that its use will affect the power profile samples. The paucity of knowledge required to make a successful DPA attack which completely cracks a hidden DES key on a sealed platform clearly shows that DPA is a very effective means of penetrating a sealed platform. Only one specific form of DPA attack is described herein, but there are many related forms of DPA attacks which are also possible.
While the effects of a single transistor switching would be normally impossible to identify from direct observations of a device's power consumption, the statistical operations used in DPA are able to reliably identify extraordinarily small differences in power consumption.
Physical measures to protect sealed platforms against attack are known to include enclosing systems in physically durable enclosures, physical shielding of memory cells and data lines, physical isolation, and coating integrated circuits with special coatings that destroy the chip when removed. While such techniques may offer a degree of protection against physical damage and reverse engineering, these techniques do not protect against non-invasive power analysis methods. Some devices, such as those shielded to United States Government Tempest specifications, use large capacitors and other power regulation systems to minimize variations in power consumption, enclosing devices in shielded cases to prevent electromagnetic radiation, and buffering inputs and outputs to hinder external monitoring. These techniques are often expensive or physically cumbersome, and are therefore inappropriate for many applications, for smartcards, secure microprocessors, and other small, low-cost, devices. Physical protection is generally inapplicable or insufficient due to reliance on external power sources, the physical impracticality of shielding, cost, and other characteristics imposed by a sealed platform's physical constraints such as size and weight.
In contrast to physical protection, smartcards may also be protected from a power analysis attack to an extent, at the software level, by representing data in a “Hamming-neutral” form. The Hamming weight of a bit string, such as a data word or byte, is the quantity of bits in the bit string with a value of 1. For example, 10100 will have a Hamming weight of 2, and 1111 will have a Hamming weight of 4. A set of “Hamming neutral” bit-strings is a set of bit-strings that all have the same number of 1s, for example, the set {011, 101, 110} is a Hamming-neutral set. If all of the data bytes manipulated by a software application have the same number of 1s, the power consumed by the device and the noise it emits will not vary as the device processes this data. For example, one could encode a bit string by replacing each “1” with a “10”, and each “0” with a “01”. All bit-strings would then have an equal number of 1s and 0s, and there would be no detectable power or noise variation between any pair of bit-strings.
This technique is known in the art of electrical signaling and hardware design, where it is referred to as power-balanced or differential signaling. The benefits of such circuits include. reduction in noise emissions or induction of cross-talk in other circuits; reduction in ground bounce; because power requirements are constant, the voltage of the ground bus does not rise locally when a circuit switches from low to high; and independence from environmental noise; as both electrical lines in a differential pair are influenced by essentially the same level of environmental noise, there is theoretically no net difference detected at the receiving end. These techniques are commonly used in military, super-computer and industrial control applications.
Since a normal, unsealed platform is susceptible to attacks potentially more powerful than power analysis (PA), the use of PA in discovery of secret information is primarily directed towards sealed platforms, such as smartcards. However, a simulated power profile of execution can be generated on a simulator for any processor, so it is possible to analyze algorithms for execution on ordinary, unsealed platforms using PA. Hence, although the most urgent need for PA resistance is for use on sealed platforms, such as smartcards, PA resistance is applicable to a much wider variety of platforms. Improved security is therefore useful for such devices to be securely used in a broad range of applications in addition to traditional retail commerce, including parking meters, cellular and pay telephones, pay television, banking, Internet-based electronic commerce, storage of medical records, identification and security access. There is therefore a need for a method, apparatus and system to reduce the amount of useful information leaked to attackers without resulting in excessive overheads. Reducing leakage refers generally to reducing the leakage of any information that is potentially useful to an attacker trying to determine secret information.
In WO 01/61915 the vulnerability of a system is reduced by introducing a randomness to the observable operation, thereby frustrating the correlation if output power emissions with any meaningful internal processing.
In U.S. Pat. No. 6,278,783 methods and apparatus are described for improving DES and other cryptographic protocols against external monitoring attacks by reducing the amount and signal-to-noise ratio of useful information leaked during processing. An improved DES implementation of the invention instead uses two 56-bit keys (K1 and K2) and two 64-bit plaintext messages (M1 and M2), each associated with a permutation (i.e., K1P, K2P and M1P, M2P) such that K1P {K1} XOR K2P {K2} equals the “standard” DES key K, and M1P {M1} XOR M2P {M2} equals the “standard” message. During operation of the device, the tables are preferably periodically updated, by introducing fresh entropy into the tables faster than information leaks out, so that attackers will not be able to obtain the table contents by analysis of measurements. The technique is implementable in cryptographic smartcards, tamper resistant chips, and secure processing systems of all kinds.
WO 01/08012 describes an apparatus and a method for preventing information leakage attacks on a microelectronic assembly performing a cryptographic algorithm by transforming a first function, used by the cryptographic algorithm, into a second function, the method including the steps of receiving a masked input data having n number of bits that is masked with an input mask, wherein n is a first predetermined integer; processing the masked input data using a second function based on a predetermined masking scheme; producing a masked output data having m number of bits that is masked with an output mask, wherein m is a second predetermined integer.
In WO 00/02342 methods and apparatus for increasing the leak-resistance of cryptographic systems using an indexed key update technique are disclosed. In one embodiment, a cryptographic client device maintains a secret key value as part of its state. The client can update its secret value at any time, for example before each transaction, using an update process that makes partial information that might have previously leaked to attackers about the secret no longer usefully describe the new updated secret value. By repeatedly applying the update process, information leaking during cryptographic operations that is collected by attackers rapidly becomes obsolete. Thus, such a system can remain secure against attacks involving analysis of measurements of the device's power consumption, electromagnetic characteristics, or other information leaked during transactions. The present invention can be used in connection with a client and server using such a protocol. To perform a transaction with the client, the server obtains the client's current transaction counter. The server then performs a series of operations to determine the sequence of transformations needed to re-derive the correct session key from the client's initial secret value. These transformations are performed, and the result is used as a transaction session key.
WO 99/67909 proposes a leak minimization for smartcards and other cryptosystems using a reduction of the amount of useful information leaked during processing. This is accomplished by implementing critical operations using “branchless” or fixed execution path routines whereby the execution path does not vary in any manner that can reveal new information about the secret key during subsequent operations. More particularly, various embodiments of the invention include: implementing modular exponentiation without key-dependent conditional jumps; implementing modular exponentiation with fixed memory access patterns; implementing modular multiplication without using leak-prone multiplication-by-one operations; and implementing leak-minimizing multiplication and other operations for elliptic curve cryptosystems.
In WO 99/67766 methods and apparatus are disclosed for performing computations in which the representation of data, the number of system state transitions at each computational step, and the Hamming weights of all operands are independent of computation inputs, intermediate values, or results. Exemplary embodiments implemented using conventional leaky hardware elements such as electronic components, logic gates, etc. as well as software executing on conventional leaky microprocessors are described. Smartcards and other tamper-resistant devices of the invention provide improved resistance to cryptographic attacks involving external monitoring.
In WO 99/63696 methods and apparatus are disclosed for securing cryptosystems against external monitoring attacks by reducing the amount and signal to noise ratio of useful information leaked during processing. This is generally accomplished by incorporating unpredictable information into the cryptographic processing. Various embodiments of the invention use techniques such as reduction of signal to noise ratios, random noise generation, clock skipping, and introducing entropy into the order of processing operations or the execution path. The techniques may be implemented in hardware or software, may use a combination of digital and analog techniques, and may be deployed in a variety of cryptographic devices.