Simple Power Analysis (SPA) is a technique that involves directly interpreting power consumption measurements collected during cryptographic operations. SPA can yield information about a device's operation as well as key material.
Using SPA, modular squaring operations can be distinguished from modular multiplication operations by analyzing the different power consumption profiles produced when modular squares and modular multiplications are computed. In early cryptographic devices that used separate circuits for squaring and multiplication, power consumption differences between these operations could be quite large. Even when the same circuit is used for squaring and multiplication, the power consumption profiles can be significantly different due to the difference in computational complexity between modular squaring and modular multiplication operations. Systems may be compromised due to secret keys being leaked if modular squares can be differentiated from modular multiplications.
The difference in power profiles between squares and multiplications exists even when random inputs are submitted to a general multiplication circuit. (In this context “squaring” means exercising the circuit to multiply a parameter by itself.) An optimized squaring operation can be faster than a multiplication. But independent of any speed optimizations, the computational complexity of a square—measured by counting the number of transistors that switch during the operation—is lower when averaged over many random inputs than the average complexity of many multiplications with different random inputs. Therefore, if the same circuit performs the squaring and multiplication operations, the squaring and multiplication operations can often be distinguished from one another and exploited, if care is not taken to level the differences.
Many cryptographic algorithms, like RSA and Diffie-Hellman, involve performing modular exponentiation. To improve speed of computation, methods have been devised to perform the exponentiation by squaring, often called “square-and-multiply” algorithms. Examples of square-and-multiply algorithms for modular exponentiation include left-to-right square and multiply; right-to-left square and multiply; k-ary exponentiation; sliding window method; and Montgomery powering ladder.
FIG. 1A shows a square-and-multiply algorithm where b is raised to an exponent 100111010110101, corresponding to a decimal value of 20149. The base is denoted by b, and A is an accumulator. After initialization by 1, the exponent can be built up cumulatively one bit at a time from the left to right as (1, 0, 0, 1, 1, 1, . . . )=(1, 2, 4, 9, 19, 39, . . . ). In other words, the exponent can be constructed using a series of steps, where each step depends on the bit in that step and the result from the previous step. If the bit is 0, the operation comprises squaring the previous result. If the bit is 1, the operation comprises squaring the previous result and multiplying the square with the base b. If no SPA or differential power analysis (DPA) countermeasures are used, then in the left-to-right and right-to-left square-and-multiply algorithms for exponentiation, an attacker who can differentiate squares from multiplies can determine the complete exponent being used.
FIG. 1B illustrates a power trace of the modular operations in an exponentiation scheme in which a table of various powers of b are precomputed: b0, b1, b2, b3. (The value b0 is equivalent to 1.) In this scheme, there are always two squares followed by a multiplication by one of the table entries. This square-square-multiply algorithm produces a very symmetrical power trace of two consecutive lows and one high (SSM SSM SSM SSM . . . ) in the power profile. (This is the k-aray exponentiation algorithm, with k—the maximum number of exponent bits that are processed per multiplication—equal to 2.) Since the patter of squares and multiplies is always SSM, regardless of the bits of the exponent, distinguishing squares from multiplies is not sufficient to reveal the key. This allows the secret key to be hidden, and may protect the system against certain SPA attacks. However, an attacker who can distinguish one type of multiplication from another can still gain information about the key.
Some methods omit the multiplication by 1, or use dummy multiplications by another value (discarding the result) in an effort to mask the power trace. Multiplying the previous result by 1 produces the same output as the previous result, and thus the output does not have to be discarded. Omitting the multiplication by 1 leaves a potentially detectable SPA characteristic. The extra step of discarding the output of a dummy operation might also be detectable by SPA. Even if the multiplication by 1 is not omitted, the operation has low computational complexity and does not require much computational power. As a result, an attacker may be able to decipher multiplications by 1 anyway based on their power profiles.
In FIG. 1B, for example, an attacker may be able to detect when multiplications by 1 occur by analyzing the power trace, and determine that the two exponent bits at those locations are zero. (Note that in FIG. 1B, for convenience a sequence square-square-multiply-by-bx is referred to as SSX. The sequence of operations includes multiplications by b0, b3, b2, b2, b3, b1, and b1, and is therefore denoted as SSOSS3SS2SS2SS3SS1 SS1.) An attacker who can identify the multiplications by 1 (that is, by b00) may not be able to decode the remaining non-00 exponent bits (e.g. 01, 10, or 11) using SPA because of the uniformity of the power profiles at those multiplication locations. Subsequently, the attacker may only be able to obtain approximately a quarter of the exponent bits using this approach, which may or may not be sufficient to break the security of the cryptosystem.
FIG. 1C illustrates the clustering of multiplications into sets based on slight differences in the power profiles for different multiplications. As stated earlier, an attacker may be able to detect the locations of the 00 exponent bits, but may not be able to determine the actual values of the non-00 bits. In other words, the attacker may not be able to distinguish whether a multiplication is by a base to the first power, second power, or third power. In practice, however, most devices usually have some leakage, and each type of multiplication may display a different characteristic.
For example, as shown in FIG. 1C, the power profile for multiplication operations for bits 11 (decimal value 3) may display a tiny spike at the front of a step. Similarly, the power profile for multiplication operations for bits 10 (decimal value 2) may display a tiny spike at the middle of a step, and the power profile for multiplication operations for bits 01 (decimal value 1) may display a tiny spike at the end of a step. If these tiny spikes features can be observed in an individual power trace, an attacker may be able to classify these multiplications into three different sets (A, B, C) corresponding to b1, b2, b3, (or simply “1”, “2”, “3”, although the correspondence may at first be unknown to the attacker). To further confirm the classifications, the attacker can repeat encryptions of the same message and average the results of the power profiles over a number of exponentiations, for example over 1000 exponentiations, to observe these fine-scale differences between the multiplications. If the attacker is successful in clustering the different multiplications into sets of (A, B, C), it is relatively easy for the attacker to decipher the exponent key by performing a search. In the example of FIG. 1C, there are only 6 ways that (A, B, C) can map to (1, 2, 3), thus the exponent key may potentially be deciphered using less than a 3-bit search.
One countermeasure to the above problem is to mask the exponent and randomize the masking of the exponent in different computations such that the sequence of operations may be entirely different in a subsequent computation. For example, if the first and last operations both belonged to a cluster A in for the first exponent, then with the next exponent it may be that the first operation corresponds to a cluster D, while the last operation is in a different cluster, E. If the exponent is being randomized from one computation to the next, an attacker will have to be able to perform a clustering successfully (and correct all errors) from a single power trace, which increases the difficulty in deciphering the exponent key. (Exponent randomizing methods in a group with order phi(N) are well known in the background art, and include such methods as using (d′=d+k*phi(N)) in place of d, splitting d into (a, b) such that a+b=d, or such that b=(d*a′) mod phi(N).)
FIG. 1D illustrates the application of the sliding window algorithm to the exponent 100111010110101 of FIG. 1B. The sliding window algorithm can reduce the amount of pre-computation required when compared to the square-square-multiply exponentiation in FIG. 1B, by reducing the average number of multiplications performed (excluding squarings). Thus, the sliding window algorithm is more efficient and requires fewer memory locations to store entries.
As shown in FIG. 1D, the sliding window algorithm translates the sequence SS2 (i.e. square, square, multiply by b2) into a different sequence S1S (square, multiply by b1, square). The sequence S1S is equivalent to bit 2 (10) because S1S comprises a square multiplier S (0) followed by 1S (10). By replacing all the SS2's with S1S's, the value 2 can be omitted from the table. Thus, the sliding window algorithm allows for one less table entry, with the resulting table having only entries (0, 1, 3). This reduction in memory location can reduce the number of parts required for manufacturing the device and can provide cost benefits, especially if the manufacturing of the device is sensitive to cost.
FIG. 1D further shows another way to reduce the number of multiplications in the sliding window algorithm. As stated earlier, the bits 0110 corresponding to SS1|SS2 can be replaced with SS1|S1S. SS1S1S still uses two multiplications (each by 1). However, using the sliding window algorithm, the two multiplications can be reduced to only one multiplication if the sequence SS1|S1S is translated to sequence S|SS3|S, which has only one multiplication (by 3). From the table, it is seen that the sequence S|SS3|S also corresponds to bits 0110. Therefore, in the sliding window algorithm, the exponent does not always have to be divided into 2-bit blocks (hence the term “sliding”), and the number of multiplications can be reduced by looking at each bit from left to right along the exponent and using the methods described above.
FIG. 1E illustrates a way of decoding the exponent in the sliding window algorithm based on a power profile. As indicated in FIG. 1E, in the sliding window algorithm, there is a decision point at the first bit 1, and at every subsequent non-zero bit (i.e. bit 1). The multiplication step in the algorithm does not occur until the decision point is reached. Depending on the next bit in the exponent, the algorithm can execute one of the following two operations. If the next bit after the decision point is a 0 (i.e. the 2-bit value is 10), the algorithm inserts an S1S (instead of a SS2, since the table no longer has an entry 2). If the next bit after the decision point is a 1 (i.e. the 2-bit value is 11), the algorithm inserts an SS3.
An attacker may typically see sequences of many squares in a power profile where a sliding window algorithm is used. With the simple binary algorithm, an attacker who can differentiate squares from multiplies can decode them to completely recover the exponent. With the sliding window algorithm, some multiplies correspond to 1 (multiplications by b1), while others correspond to 3 (i.e. b3). Although this results in some ambiguity in decoding the exponent, an attacker still knows that every sequence SSM corresponds to a two-bit section of the exponent where the low-order bit is 1: i.e. the exponent bits are “?1”. Additionally, in any sequences of S's between M's, the attacker knows that all but the last two S's before an M must correspond to bits of the exponent that are 0. Together, these facts allow much of the exponent to be decoded. Furthermore, there are some cases where two M operations occur with fewer than k squares between them, which results from certain exponent bit patterns. When this occurs, it reveals additional bits of the exponent that are zero. For example, when k=3, the sequence MSM can occur which is not possible in the straight k-ary exponentiation algorithm. (In FIG. 1E this is characterized by high-low-high power in the power trace.) When this pattern occurs (for the sliding window algorithm with only 1 and 3 in the table), it can only mean that the exponent bits were ‘1110’. This fact may in turn allow the decoding of bits before and after the segment. A closer examination of the power profiles surrounding the MSM sequence in the example of FIG. 1E shows that the MSM sequence is part of a longer sequence of SSM|SMS|SMS must correspond to 111010. In other words, the attacker is able to determine the values (3, 1, 1) at these locations. By analyzing the full power trace in view of the above MSM sequence and S..SS sequences, the attacker may be able to decode one-third or possibly two-thirds of the bits in the exponent. If the attacker is able to decode at least half of the bits in the exponent, the attacker may be able to solve for the exponent analytically. In some cases, decoding one quarter of the bits—or even a few bits per exponentiation—may be sufficient to break the cryptosystem.
Furthermore, the attacker may be able to visually identify sets of 0's, 1's, and 3's by averaging the power profiles over thousands of exponentiations, and looking for characteristics at each MSM location (3, 1) and the remaining unknown multiplication locations, similar to the method discussed with reference to FIG. 1C. In this case, the attacker may, for example, determine that out of the identified MSM locations in the power trace, ten locations correspond to 3's, and five locations correspond to 1's. The attacker can then compare the known power profiles of 1's and 3's at these known MSM locations with the remaining unknown multiplications at other locations (for example, 200 multiplications may be unknown) along the power trace. If the attacker is able to cluster the bits (0, 1, 3) into three sets, the attacker can then decode the exponent entirely.
DPA and Higher Order DPA Attacks
Previous attempts have been made to foil SPA by masking the exponent value. Masking of intermediate values in modular exponentiation can help resist against DPA attacks. For example, in typical blinded modular exponentiation, an input can be effectively masked or randomized when the input is multiplied by a mask that is unknown to the attacker. The masked or randomized input can later be unmasked at the end of the operation. Such masking may take advantage of modular inverses, such that (X*X−1) mod N=1. For example, (A*(XE))D*(X−1) mod N is equal to AD mod N, for exponents D and E where XED=X mod N.
Different masks are typically used for different operations, but are not changed in the middle of a modular exponentiation. Between operations, a new mask is sometimes generated efficiently from a previous mask by using a modular squaring. (i.e. if I=XE and O=X−1 are pre-computed modulo N and stored, a new set of masks I′ and O′ can be computed efficiently by squaring with I′=I2 mod N and O′=O2 mod N.) However, designs in which the mask is updated only between exponentiations (and not within a single exponentiation) can be vulnerable to DPA and higher order DPA attacks in the form of cross-correlation attacks. These cross-correlation attacks are clustering attacks similar to the SPA clustering attacks described above, but employing statistical methods to identity the clusters. In contrast to a regular DPA attack which targets a specific parameter at one point, higher order DPA attacks target the relationship(s) between the parameters by using multiple power measurements at different locations in the trace to test the relationship(s). If the input parameters are the same in those locations, those parameters will have higher correlation, compared to the locations in which the parameters have no relationship (i.e. different parameters). In many cases, a correlation is detectable if even one parameter is shared between two operations—for example, a multiplication of A1 by B3, and the second, a multiplication of A2 by B3. A cross-correlation attack allows an attacker to test for this correlation between operations caused by shared use of a parameter.
The doubling attack and the “Big Mac attack” are two types of cross-correlation attacks. The doubling attack is described in a paper authored by P. Fouque and F. Valette, titled “The Doubling Attack—Why Upwards is Better than Downwards,” CHES 2003, Lecture Notes in Computer Science, Volume 2779, pp. 269-280. The “Big Mac” attack is a higher order DPA attack, and is described in the paper authored by C. D. Walter, titled “Sliding Windows Succumbs to Big Mac Attack,” published in CHES 2001, Lecture Notes in Computer Science, Volume 2162, January 2001, pp. 286-299.
The doubling attack targets designs in which the masks are updated by squaring, and looks at the relationship between the j'th operation in the k'th trace and the (j−1)'th operation in the (k+1)'th trace. For exponentiation algorithms such as sliding window, the operations will share an input if and only if the j'th operation in the k'th trace is a square—and the correlation between variations in the power measurements is often higher in this case.
In the “Big Mac” attack, an attacker identifies all of the multiplications in a single trace, and attempts to identify clusters of operations that share a multiplicand. For example, in the SSM example of FIG. 1C, there are four types of multiplication: by 1, b1, b2, and b3. If an obvious SPA characteristic has not been found that allows the multiplications by 1 and clusters A, B, and C to be determined, an attacker may still be able to determine cluster classifications by mounting a cross-correlation attack.
The attack begins by dividing the trace into small segments, with each segment corresponding to a square or multiplication. The correlation between one multiplication and the next is calculated between the small segments corresponding to each operation. (A Big Mac attack can also work with many traces-especially if the exponent is not randomized.)
More generally, cross-correlation attacks can look for any relationship between operations. If the attacker can determine the relationship between the input to a particular square or multiplication, and an input or output of some other operation, the attacker can then obtain information about the secret key and undermine the design's security. As another example, if the multiplication by 1 (in FIG. 1B) were replaced by a multiplication by another value (discarding the result), then a correlation may appear between the output of the operation before the dummy mult and the input of the operation after the dummy. In general, an attacker can perform cross correlation attacks by analyzing correlation relationships across different operations that share an input or output, or where the output of one is an input of the other. These relationships can be summarized in terms of which parameters are in common between the LHS (Left Hand Side), RHS (Right Hand Side), and OUT (output) parameters.
For example, if the same LHS (“L”) parameter is used in different multiplications but the RHS (“R”) parameters are different between or among those multiplications, an L-L relationship exists between those multiplications.
Conversely, if the same R parameter is used in different multiplications but the L parameters are different between or among those multiplications, an R-R relationship exists between those multiplications.
Furthermore, if the L parameter in one multiplication is the R parameter in another multiplication, then an L-R relationship exists between those multiplications.
A final category comprises of relationships where the output of one multiplication (“O”) is the input to another multiplication. This may correspond to a O-L (Output-LHS), O-R (Output-RHS), or O-O (Output-Output) relationship between those multiplications.
If a multiplier deterministically uses the above parameters in a particular manner, then feeding the same LHS parameters into two different multipliers will result in the two multipliers operating on these parameters in the same way when combined with the RHS parameter. As a result, if there is a power leak which reveals information about the LHS parameter, and if the leak can be expressed as H1(L), an attacker feeding the same LHS parameter into the multipliers will obtain the same H1(L) leak and observe the similarity in the leak.
Leakage functions commonly involve a function of the L, R, or O parameters. A typical leakage function may also leak the higher bit of each word of L. For example, if L is a Big Integer represented using 32×32-bit words, an attacker can obtain 32 bits of information about L. This is a hash function because it is compressed and has a constant output size of 32 bits. However, this hash function is not cryptographically secure because an attacker can determine the exact values of the 32 bits, and many bits of L do not influence/affect the compression function.
An attacker who knows 32 bits of information about L, and who feeds the same L into a given leakage function for each bit of the word, may be able to immediately detect if there is a collision. Collisions for other L's that are similar can also be detected because only 32 bits are needed to be the same in order to obtain a collision.
However, if an attacker is performing a modular exponentiation and submitting a RAM sequence of messages to compare values at different locations, the probability of triggering a collision is low for the L-L relationship unless the values are identical. This also applies for the R-R relationship. When an attacker observes a word (or a parameter) with 2 bytes that are zero in the same locations, the attacker can determine that the word/parameter is the same between the two cases, and can thus determine the bytes of R that are zero. However, there may be numerous operations in which the parameters are different and no leakage is triggered in those operations.
For example, in an L-R relationship, the two leakage functions are different from each other. In some cases, the leakage function R is triggered only when the entire value of a byte is 0, and the leakage function L is triggered only when the entire value of the byte is 0 and the higher bit is 0. As such, in cases where the higher bit is 1, a leakage function L will not be triggered. An attacker may also observe R as a function of L, with the leakage function spreading the higher bits of L over the range of the leakage of the bytes of R that occur in between multiplication locations. As a result, it is more difficult for an attacker to precisely exploit an L-R relationship:
Lastly, the O-L, O-R, and O-O relationships are significantly harder to exploit, although one way to exploit those relationships may be to transform the trace first before performing the correlation calculation. (The O-L and O-R correlations are particularly relevant, for example, when attacking the Montgomery Ladder exponentiation system.)
In contrast to the leakage function H1(L) which relates to functions on the left hand side, the leakage function H2(R) relates to functions on the right hand side. An attacker may be able to determine when a whole word is zero, and distinguish a zero from a non-zero. The attacker can also determine the bits of the higher order byte of the output, and may even be able to determine the entire value of the output.
FIG. 1F shows an exponentiation using the k-ary square-and-multiply-always algorithm, where the system is vulnerable to both a doubling attack and a clustering attack. In the example of FIG. 1F, the exponent comprises of dummy multipliers (discardable multiplies) inserted between every pair of squares in an SMSSMSS . . . pattern, which results in a SMSMSMSMS . . . pattern.
As shown in FIG. 1F, the first squaring operation on input i begins from the leftmost bit and results in i2, which is the product of i*i. The next bit corresponds to a multiplication operation, where i2 is multiplied by i to yield i3. The subsequent squaring operation on the output of the previous multiplication results in i6 (which is given by i3*i3). The following is a dummy multiplication, corresponding to a blinded representation (of the dummy multiplier 1). In the dummy multiplication, the output of the previous squaring operation (i6) is multiplied by i to yield i7. However, the output i7 from this dummy multiplication is discarded. In other words, the output i7 of the dummy multiplication does not constitute input for the next squaring operation. Instead, the output of the previous squaring operation (i6) is provided as input to the following squaring operation, which yields i12 (given by i6*i6).
A cross-correlation attack in combination with a clustering attack may be performed in the example of FIG. 1F. Specifically, an attacker may perform a doubling attack by comparing an operation k+1 in a first trace, with an operation k in a second trace, and analyzing the correlation in power consumption between the operation k+1 in the first trace and the operation k in the second trace. The attacker can next perform a clustering attack which is described as follows.
For example, with reference to FIG. 1F, the first multiplication operation comprises an L parameter (2) and an R parameter (1); and the second squaring operation comprises an L parameter (3) and an R parameter (3). The correlation from the first multiplication operation to the second squaring operation can be denoted as a, comprising an L-L correlation (2-3) and an R-R correlation (1-3). The L-L and R-R correlations with respect to α are not expected to be significant. Also, although there is an output-input correlation, this correlation is usually difficult to detect unless an attacker specifically attacks this correlation.
Next, the dummy multiplication operation comprises an L parameter (6) and an R parameter (1); and the third squaring operation comprises an L parameter (6) and an R parameter (6). The correlation from the first multiplication operation to the second squaring operation can be denoted as p, comprising an L-L correlation (6-6) and an R-R correlation (1-6). As stated previously, the output i7 from the dummy multiplication is discarded. However, if the L-L correlation is significant, one would expect to observe a higher correlation in the case where the result/output from one operation is discarded (in p) than in the case where the result/output is not discarded (in a). Thus, an attacker may be able to successfully perform a cross-correlation attack and a clustering attack on the exponent in FIG. 1F, even though dummy multipliers have been inserted to create a symmetrical square-and-multiply-always pattern (SMSMSMSMS).
With reference to FIG. 1F, it is noted that if the dummy multiplication results are discarded, special circuitry is required to process the discarded data, and to control whether an output is sent to the accumulator or whether the output is discarded. Typically, this processing can also be performed using software instead of special circuitry. Nevertheless, the software manipulations can be vulnerable to SPA attacks because even though the sequence of squares and multiplies is the same, gaps can exist between locations where the multipliers are not active. In those gaps, the processor is performing computations to determine which parameter to load (or the processor may also be copying parameter into another location). As a result, the timing of those gaps may leak significant power. In some instances, even the standard squares and multiplications can have significant SPA leakage, depending on the computations performed by the processor and the sequence of operations.