1. Field of the Invention
The present invention relates Cryptographic Bus Architectures (CBA). More specifically, the present invention relates to a CBA that prevents an attacker from being able to correlate side channel information.
2. Description of Related Art
Modem cryptography uses the same basic ideas as traditional cryptography, transposition and substitution. Messages to be encrypted, known as plaintext, are transformed by a function that is parameterized by a key. The output of the encryption process, known as the ciphertext, is then transmitted. The received ciphertext is then decrypted, using a related function and key combination, back into plaintext.
One example where modern cryptography is used is in pay-TV conditional-access systems such as pay channels for cable and satellite television. Smart cards and/or security processors (containing secret keys) are used to decrypt the television signals. Attackers buy a cable or satellite receiver and then attack the smart card or security processor inside in order to determine the secret keys. Thus, it is generally assumed that the input and output information, i.e. the plaintext and ciphertext, is available to attackers, and information about the secret keys is unavailable. The cipher text is the information sent from the cable or satellite provider, and the plaintext is the decrypted television signal set to the television. An attacker, as depicted in FIG. 1, may attack the smart card or security processor by looking for information related to the secret keys that may be leaked via EM radiation, power consumption, timing etc. The leaked information, commonly referred to as side channel information, can then be used by attackers in order to determine the secret key used. One common technique for determining a secret key from leaked or side channel information is known as Differential Power Analysis (DPA). Unfortunately, there is no way to guarantee that power consumption, EM radiation, etc. will not leak certain cryptographic process information being performed by a device and thus obtain information about the secret keys. Therefore, what are needed are defensive techniques that result in leaked information that is un-usable by hackers using correlation techniques such as DPA.
The following discussion is background information regarding using DPA to determine the secret key in a smartcard. One skilled in the art will appreciate that this discussion is for illustrative purposes only, and that the present invention may be utilized to protect secret keys of a number of data encryption formats and from a number of hacking techniques in which side channel information is used in order to determine the secret keys.
First, in order to better understand how hacking techniques work, knowledge of common encryption/decryption systems is useful. A common type of cryptosystem uses a block cipher for encrypt and decrypt operations. A block cipher operates on a fixed number of input bits and encrypts or decrypts these bits into a fixed number of output bits. The encrypt and decrypt functions are often constructed using a simple function called a round function. The security of the cryptographic algorithm is achieved by repeatedly applying the round function a fixed number of times. Such a cipher is referred to as an iterative-block cipher. The number of times a block is addressed by a round function is determined, in part, by the secret key.
The Data Encryption Standard (DES) defines a commonly known iterative-block cipher. DES is described in detail in ANSI X.392, “American National Standard for Data Encryption Algorithm (DEA),” American Standards Institute, 1981, which is incorporated by reference herein. One of the major components of the round function of DES is the so-called substitution box or S-box functions. The S-box functions are non-linear and are conventionally implemented using table lookups or Boolean logic gates. The secret key controls access to the S-box function.
A common implementation of the iterative-block ciphers uses the secret key each time a round function is calculated. When this secret key is accessed by a cryptographic device, information about this secret key is apt to be leaked outside the device and can be monitored by an attacker who is able to get close enough to the device to monitor it. In the case of smart cards, if the attacker has possession of the smart card, the attacker is close to the cryptographic device therein to use techniques such as DPA against the device. The information that is leaked is often very subtle and difficult to interpret. However, because this information is correlated to the actual keys within the device, an attacker can use statistical techniques, such as a DPA attack, to effectively amplify the information and breach the security of the cryptosystem.
Recently, it has been shown that Differential Power Analysis (DPA), which relies on side-channel information, can be utilized by attackers to gain information about secret keys. FIG. 2 is a simple lumped component model that is useful for understanding power dissipation measurements. However, one skilled in the art will understand that many other secure systems could be monitored in a similar manner as that shown in FIG. 2 for monitoring a smart card.
One way that power dissipated by a smartcard can be monitored at the ground pin of the smartcard is by using a small resistor (R1) in series between the Vss pin on the card and the true ground. Current moving through R1 creates a time varying voltage that can be sampled, perhaps by a digital oscilloscope. In a CMOS circuit, most power is dissipated when the circuit is clocked. This is known as dynamic power dissipation. Information useful to an attacker is leaked because the amount of current being drawn when the circuit is clocked is directly related to the change of state of CLOAD or the resulting current drawn by the other gates attached to CLOAD. On a microprocessor, each clock pulse causes many bit transitions to occur simultaneously. These changes can be observed via the digital oscilloscope.
In a conventional implementation of a cryptographic algorithm, the leaked information is correlated to the secret data, thus enabling an attack. For example, Messerges et al, in “Investigations of Power Analysis on Smartcards”, Proceedings of USENIX Workshop on Smartcard Technology, May 1999, pp. 151-161, used actual results from monitoring smartcard power signals to further analyze DPA techniques for attacking DES. More recently, Manfred Aigner et al, in “Power Analysis Tutorial”, Institute for Applied Information Processing and Communication University of Technology Graz, Austria, have presented a thorough tutorial for a DPA attack, including how to measure power consumption precisely, and then divide them into two or more different sets with the aid of a selection function D. The power traces of each set are averaged and the result is a bias signal. The two bias signals are subtracted from each other. One input to the D function is six bits of the subkey. The attacker does not know these bits, but can use brute force and try all 26 possible values. For each guess, the attacker re-divides the power traces into different sets, re-calculates the averages and generates a different bias signal. If and only if the D function is correct one can see noticeable peaks in the bias signal. Thus, the attacker can use the information and determine the secret key.
Typically, prior to the beginning of the 16th round in a given DES encryption operation, the algorithm will transform a plaintext message based on a secret key into a target binary bit R2[b], with a value of either 1 or 0. The final ciphertext is available after the 16th round as shown in FIG. 3(a). The DPA attacker is able to view this target bit R2[b], based on the above observable ciphertext, by using a selection function D as defined in Manfred Aigner et al. in “Power Analysis Tutorial” mentioned above. As is shown by FIG. 3(b), when selection function D(ci, Ks) computes R2[b] with a correct secret key within a given collection of m ciphertexts, those ciphertexts which produce the value of 1 (R2[b]=1) can be grouped into a single set S1, while those which produce the value of 0 (R2[b]=0) can be grouped into another set S2. (For a simplified, more detailed illustration, see FIG. 4(a).) Obviously, as shown in FIG. 3(c), if a different secret key is guessed during these m selection operations, the set S1′ which produces the value of 1 will be different from the set S1, although there will be overlaps. (For a simplified, more detailed illustration, see FIG. 4(b).) In fact, statistically, about half of the S1′, will be identical to those from S0. These characteristics provide the DPA attacker with the opportunity to determine the secret key by a clever but roundabout approach.
Here is how a DPA attack works. During a DES transformation of a plaintext message into a corresponding target cryptographic cipher bit R2[b], DPA attempts through exhaustive guesses to arrive at the secret cipher's six key bits Ks, represented by 0≦Ks<26. In any one attempt, using the same large number of m ciphertexts, the resulting binary values of R2[b] will, as always, be either 1 or 0; however, the values will be correctly assigned for every ciphertext only if the key has been correctly guessed. The DPA attacker now groups all the ciphertexts which seemingly produce values of 1 (R2[b]=1) into a single set and all the other ciphertexts, which produce an apparent value of 0 (R2[b]=0) into another set. Since each ciphertext c1 in each set has its own corresponding power trace wi, the attacker can now calculate the average of these power traces (i.e., waveforms) from each set and then compute the difference between the two waveform averages. (Another name for such an average is the bias signal.) The difference in these two bias signals is exploited by the attacker as follows.
DPA utilizes the statistical average of these two sets to determine whether the six key bits Ks for a given target bit have been guessed correctly in the attempted key Ks′. When the key that has been guessed Ks′ is wrong, the waveform averages will be identical because about half of the ciphertexts in each set will be wrong. For example, as shown by FIG. 3(a), if the left hand side is meant to represent the set of R2[b]=1 (i.e., S1′), half of the ciphertexts will still have a power trace of ‘0’ (shown as the bottom half, which come from S0), thus making the set average equal to 0.5. Similarly, the right hand side is meant to represent the set of R2[b]=0(i.e., S0′), and there, too, half of the ciphertexts will wrongly have the power trace of ‘1’ (which comes from S1) again averaging to 0.5. As a result, the difference between the two averages will be very small (almost ‘0’) and a trace of the difference will be essentially a flat-line. However, if the guessed key Ks′ is the correct key Ks, then the power consumption trace of the set R2[b]=1 (i.e. the true S1) will be very different from the set of R2[b]=0(i.e. the true S0). Thus, as shown in FIG. 5(b), the difference will be very big (almost ‘1’) because one set of ciphertexts (i.e., S1′=S1) would have the average power trace of ‘1’, but the other set (i.e., S0′=S0) would have the average power trace of ‘0’. To put it another way, the evidence of having discovered the correct key is a spike in the trace of the difference of the bias signals.
Of course, those skilled in the art will also appreciate that because the low-level instructions often manipulate several bits, a selection function can simultaneously select values of multiple bits rather than of just one bit R2[b].
In the prior art, certain techniques have been suggested to try to break the correlation between subsequent segment traces and thus foil such attacks. See, for example U.S. Pat. Nos. 6,298,135 and 6,295,606 to Messerges, et al. However, these approaches have certain limitations that are mentioned below.
In U.S. Pat. No. 6,298,135 Messerges discloses using a randomized starting point in the set of target bits. For each different plaintext sample, the corresponding target bits are processed in a different order, and thus it becomes difficult for a DPA attacker to group related target bits from all the plaintexts of interest to perform statistical analyses associated with given target bit positions. However, this approach does not conceal the information leaked by different address bits and cannot prevent a malicious attacker from using this information to reorder the target bit into the correct bit position.
In U.S. Pat. No. 6,295,606 Messerges discloses another technique that uses a random mask to keep the message and key hidden while they are stored in memory and during the processing of the cryptographic algorithm. Since the mask is randomly changed, new S-boxes must be updated accordingly, and this takes time. The disadvantage of this kind of masking operation not only slows down the DES algorithm by a factor of three to five; it also cannot prevent an attacker from gathering a 48-bit partial key from DES round 16 when the results must be unmasked to provide the correct output of the cipher. (DES round 16 is the last round in the DES encryption algorithm and its output is unmasked as the ciphertext output.) Thus, this approach becomes vulnerable to DPA after unmasking. With 48 bits now known at round 16, the remaining six key bits to make 56 can then be exhaustively searched by the attacker.
Therefore, a need exists for a way to prevent leakage attacks so that an attacker cannot gain information about the secret keys used in cryptographic devices. Further, what is needed is a computationally more efficient approach that will prevent an attacker from gaining even partial information that can be used to determine the keys. It should be apparent that if the technique used to foil the attack only adds 25% to the computational resources of the device, that is far superior to a design that adds a 100% or more to the computational resources of a cryptographic device.