Embodiments of the present invention relate to an integrated circuit including a multiplication function configured to execute a multiplication operation of two binary words x and y in a plurality of steps of basic multiplication of components xi of word x by components yj of word y.
Embodiments of the present invention relate in particular to an integrated circuit including an external data processing function, the execution of which includes at least conditional branching to at least a first multiplication step of binary words or a second multiplication step of binary words. The conditional branching is a function of a private data of the integrated circuit.
Embodiments of the present invention relate in particular to a process and system for testing of such an integrated circuit.
Embodiments of the present invention also relate to a process for protecting an integrated circuit of the above-mentioned type against a side channel analysis, and to a countermeasure allowing such an integrated circuit to pass a qualification or certification process including a test process according to embodiments of the invention.
Currently, secured processors that are more and more advanced may be found in chip cards or other embedded systems such as USB keys (flash drives), decoders and game consoles, and in a general manner, any Trusted Platform Module TPM. These processors, in the form of integrated circuits, generally have Complex Instruction Set Computer (CISC) 8-bit cores or Reduced Instruction Set Computer (RISC) cores of 8, 16, or more bits, 32-bit processors being the most widespread at this time. Some integrated circuits also include coprocessors dedicated to some cryptographic calculations, notably arithmetic accelerators for asymmetric algorithms such as Rivest, Shamir and Adleman (RSA), Digital Signature Algorithm (DSA), Elliptic Curve Digital Signature Algorithm (ECDSA), or the like.
FIG. 1 shows, as an example, a secure integrated circuit CIC1 arranged on a portable support Handheld Device (HD), for example, a plastic card or any other support. The integrated circuit includes a microprocessor MPC, an input/output circuit IOC or interface communication circuit, memories M1, M2, M3 linked to the microprocessor by a data and address bus and, optionally, a coprocessor CP1 for cryptographic calculations or arithmetic accelerator, and a random number generator RGEN. Memory M1 is a memory of the Random Access Memory (RAM) type containing volatile application data. Memory M2 is a non-volatile memory, for example an EEPROM or Flash memory, containing application programs. Memory M3 is a Read Only Memory (ROM) containing the operating system of the microprocessor.
The interface communication circuit IOC can be of the contact type, for example, according to the ISO/IEC 7816 standard, of the contactless type with inductive coupling, for example, according to the ISO/IEC 14443A/B or ISO/IEC 13693 standards, of the contactless type functioning by electric coupling (UHF interface circuit), or both of the contact and contactless type (integrated circuit called “combi”). The interface circuit IOC shown as an example in FIG. 1 is an inductive coupling contactless interface circuit equipped with an antenna coil AC1 to receive a magnetic field FLD. The field FLD is emitted by a card reader RD that is itself equipped with an antenna coil AC2. Circuit IOC includes apparatus for receiving and decoding data DTr emitted by the reader RD and apparatus for coding and emitting data DTx supplied by the microprocessor MPC. It may also include apparatus for extracting from the magnetic field FLD a supply voltage Vcc and a clock signal CK of the integrated circuit.
In some embodiments, the integrated circuit CIC1 may be configured to execute encryption, decryption, or signature operations of messages m that are sent to it, by way of a cryptographic function based on the modular exponentiation using a secret key d and a cryptographic module n, for example a cryptographic RSA function.
Overview Concerning Modular Exponentiation
The modular exponentiation function has the following mathematical expression:md modulo(n)m being an input data, d an exponent, and n a divisor. The modular exponentiation function therefore consists of calculating the remainder on the division of m to the power d by n.
Such a function is used by various cryptographic algorithms, such as the RSA algorithm, the DSA algorithm, Elliptic Curve Diffie Hellman (ECDH), ECDSA, ElGamal, or the like. The data m is then a message to encrypt and the exponent d is a private key.
Such a function may be implemented using the following algorithm (modular exponentiation according to the Barrett method):
Exponentiation algorithmInput:“m” and “n” are integers such that m < n“d” is an exponent of v bits such as d = (dv−1 dv−2... d0)2Output : a = md modulo nStep 1 : a = 1Step 2 : Pre-calculations of the Barrett reductionStep 3 : for s from 1 to v do :(Step 3A) a = BRED(LIM(a,a),n)(Step 3B) if dv−s = 1then a = BRED(LIM(a,m),n)Step 4 : Return result awherein the message m and the module n are integers (for example of 1024 bits, 2048 bits, or more), d is the exponent of v bits expressed in base 2 (dv-1, dv-2, . . . d0), “LIM” is the multiplication function of large integers (“Long Integer Multiplication”) and “BRED” is a reduction function according to the Barrett method (“Barrett REDuction”) applied to the result of the LIM multiplication.
In an integrated circuit such as that shown in FIG. 1, such a modular exponentiation algorithm may be executed by the microprocessor MP or by the coprocessor CP1. Alternatively, some steps of the algorithm can be executed by the microprocessor whereas others are executed by the coprocessor, if it is merely an arithmetic accelerator. For example, the microprocessor may confide the LIM multiplications of steps 3A and 3B to the coprocessor, or else the entire calculation may be confided to the coprocessor, depending on the case.
In addition, the LIM multiplication of a by a (Step 3A) or of a by m (Step 3B) is generally executed by the integrated circuit by means of a multiplication function of binary words x and y. This multiplication includes a plurality of steps of basic multiplication of components xi (ai) of word x by components yj (aj or mj) of word y (i and j being iteration variables), to obtain intermediate results that are concatenated to form the general result of the multiplication.
Overview of Side Channel Analysis
In order to verify the level of security offered by a secure integrated circuit to be commercialized, qualification or certification tests are performed at the industrial level. In particular, tests are performed to assess the robustness of the integrated circuit to side channel analyses aiming to discover the secret data of the integrated circuit.
The exponentiation algorithm is therefore subjected to such controls. More particularly, the side channel analysis of the modular exponentiation algorithm consists of deducing bit-by-bit the value of the exponent, by observing the “behavior” of the integrated circuit during the execution of step 3 of the algorithm, at each iteration of rank s of this step. This observation aims to determine whether the considered step 3 includes step 3A only or includes step 3A followed by step 3B.
In the first case, it can be deduced that the bit dv-s of the exponent is equal to 0. In the second case, it can be deduced that the bit dv-s is equal to 1. By proceeding step-by-step for each iteration of s=1 to s=v, all the bits dv-s of the exponent for s from 1 to v-1 can be inferred. For example, during the first iterations of the exponentiation algorithm, the result of operations:LIM(a,a),LIM(a,m)reveals that the first bit of the exponent is 1, whereas the result of operations:LIM(a,a)LIM(a,a)allows for the discovery that the first bit of the exponent is 0.
To discover the next exponent bit, the nature of the following operations must be determined. For example, if these operations are:LIM(a,a)LIM(a,m)LIM(a,a)LIM(a,m)or:LIM(a,a)LIM(a,a)LIM(a,m)the two last operations LIM(a,a) LIM(a,m) reveal that the second bit of the exponent is 1. Inversely, after the following operations:LIM(a,a)LIM(a,m)LIM(a,a)LIM(a,a)LIM(a,a)LIM(a,m)LIM(a,a)LIM(a,a)the third operation LIM (a,a) reveals that the second bit of the exponent is 0 because it is followed by LIM (a,a) and is not followed by LIM (a,m).
Thus, in order to determine the exponent bits, it is necessary to resolve any uncertainties as to the conditional branching steps performed by the integrated circuit as a function of these bits. The observation of the current consumption of the integrated circuit allows, in general, to clear up these uncertainties.
Overview of Side Channel Analysis Based on the Observation of the Current Consumption
An electronic component generally includes thousands of logic gates that switch differently depending on the operations executed. The switching of the gates creates measurable current consumption variations of very short duration, for example of several nanoseconds. Notably, integrated circuits obtained by CMOS technology include logic gates constituted of pull-up PMOS transistors and of pull down NMOS transistors having a very high input impedance on their control gate terminal. These transistors do not consume current between their drain and source terminals except during their switching, corresponding to the switching to 1 or to 0 of a logic node. Thus, the current consumption depends on data manipulated by the microprocessor and on the various peripherals: memory, data circulating on the data or address bus, the cryptographic accelerator, and the like.
In particular, the multiplication operation of large integers LIM has a current consumption signature that is characteristic and is different than ordinary logic operations. Moreover, LIM(a,a) differs from LIM(a,m) in that it consists of calculating a square (a2) whereas LIM(a,m) consists of calculating the product of a by m, which may lead to two different current consumption signatures.
Conventional side channel test processes, based on the observation of the current consumption, use Single Power Analysis (SPA), Differential Power Analysis (DPA), Correlation Power Analysis (CPA), or Big Mac Analysis.
SPA-Based Test Processes
SPA was disclosed in P. C. Kocher., Timing attacks on implementations of Diffie-Heliman, RSA, DSS, and other systems., Advances in Cryptology—CRYPTO '96, volume 1109 of Lecture Notes in Computer Science, pages 104-113., Springer 1996. SPA normally only requires the acquisition of a single current consumption curve. It aims to obtain information about the activity of the integrated circuit by observing the part of the consumption curve corresponding to a cryptographic calculation, because the current curve varies according to the operations executed and the data manipulated.
First of all, SPA allows for the identification of the calculations performed and the algorithms implemented by the integrated circuit. A test system captures a general current consumption curve of the integrated circuit by measuring its current consumption. In the case of an integrated circuit executing a modular exponentiation, consumption curves corresponding to the execution of LIM(a,a) and LIM(a,m) upon each iteration of rank s of the algorithm can be distinguished within this general current consumption curve, as shown in FIG. 2. In this consumption curve, curves C0, C1, C3, . . . Cs′ . . . can be distinguished.
Each consumption curve Cs′ consists of consumption points measured with a determined sampling frequency. Each consumption curve corresponds to an “sth” iteration of step 3 of the exponentiation algorithm. The relation between the rank s′ of each consumption curve Cs′ and the number of times “s” that step 3 of the exponentiation algorithm has already been executed (including the execution corresponding to the curve Cs′ in question) is given by the relation:s′=s+H(dv-1, dv-2 . . . dv-s-1)if the curve Cs′ corresponds to the execution of step 3A,or by the relation:s′=s+H(dv-1, dv-2 . . . dv-s-1)+1if the curve Cs′ corresponds to the execution of step 3B.
The relation between s′ and s is therefore a function of the Hamming weight H(dv-1, dv-2 . . . dv-s-1) of the part of the exponent d already used during the preceding steps of the exponentiation calculation. As the Hamming weight represents the number of bits at 1 of the part of the exponent considered, s′ is for example equal to s or to s+1 if the already used bits dv-1, dv-2 . . . dv-s-1 of the exponent are all equal to zero. As another example, s′ is equal to 2s or to 2s+1 if the bits dv-1, dv-2 . . . dv-s-1 are all equal to 1.
An “ideal” SPA-based test process should allow for the determination of whether each curve Cs′ is relative to the calculation of LIM (a,a) or of LIM (a,m), merely by the observation of the form of these curves. This may allow for the deduction, according to the deductive method described above, of exponent bit value. However, to prevent such a leak of information (“leakage”), latest-generation secured integrated circuits are equipped with countermeasures that blur their current consumption.
Thus, SPA-based test processes generally allow for the identification of the calculations performed and the algorithms implemented by an integrated circuit, and for the marking, on the general consumption curve of the integrated circuit, of the portion of the curve relative to the modular exponentiation calculation. However, they do not allow for the verification of hypotheses about the exact operation executed by the integrated circuit.
Processes based on statistical analysis techniques, such as DPA or CPA, were thus developed to identify the nature of operations during which the exponent is manipulated.
DPA-Based Test Processes
Disclosed by P. C. Kocher, J. Jaffe, and B. Jun., Differential Power Analysis. Advances in Cryptology—CRYPTO '99, volume 1666 of Lecture Notes in Computer Science, pages 388-397., Springer, 1999., and very closely studied since, DPA allows the secret key of a cryptographic algorithm to be found thanks to the acquisition of numerous consumption curves. The application of this technique the most researched until now concerns the DES algorithm, but this technique also applies to other algorithms of encryption, decryption, or signature, and in particular to modular exponentiation.
DPA consists of a statistical classification of the current consumption curves to find the searched-for information. It is based on the premise that the consumption of a CMOS technology integrated circuit varies when a bit switches from 0 to 1 in a register or on a bus, and does not vary when a bit remains at 0, remains at 1, or switches from 1 to 0 (parasitic capacitance discharge of the MOS transistor). Alternatively, it may be considered that the consumption of a CMOS technology integrated circuit varies when a bit switches from 0 to 1 or switches from 1 to 0 and does not vary when a bit remains equal to 0 or remains equal to 1. This second hypothesis allows conventional functions “Hamming distance” or “Hamming weight” to be used to develop a consumption model that does not require the knowledge of the structure of the integrated circuit in order to be applicable.
DPA aims to amplify this consumption difference thanks to a statistical processing based upon numerous consumption curves, aiming to bring out a correlation between the measured consumption curves and the formulated hypotheses.
During the acquisition phase of these consumption curves, a test system applies M random messages m0, m1, m2, . . . , mr . . . mM-1 to the integrated circuit in a way that the integrated circuit calculates the transformed message by means of its cryptographic function (which is implicit or requires the sending of an appropriate encryption command to the integrated circuit).
As shown in FIG. 3, M current consumption curves C(m0), C(m1), C(m2) . . . , C(mr), . . . , C(mM-1) are thus collected. Each of these consumption curves results from operations executed by the integrated circuit to transform the message by way of the modular exponentiation function, but may also result from other operations that the integrated circuit may execute at the same time.
Thanks to SPA, consumption curves Cs′(m0), Cs′(m1), Cs′(m2) . . . , Cs′(mr), . . . , C2′(mM-1) are distinguished within these consumption curves. These consumption curves correspond to execution steps of the modular exponentiation algorithm. As indicated above, each curve of rank s′ corresponds to the “sth” execution of step 3 of the algorithm, for one of the M messages, and involves one bit of the exponent d of which it is desired to the determine the value.
During a processing phase, the test system estimates the theoretical current consumption HW(dv-s, mr) of the integrated circuit at the calculation step in question. This consumption estimation is done for at least one of the two possible values of the searched-for bit ds of the exponent. The test system is, for example, configured to estimate the theoretical consumption that the execution of the function LIM(a,m) implies, and use this for all the values mr of the message m used during the acquisition. This theoretical consumption is for example estimated by calculating the Hamming weight of the expected result following the execution of the operation corresponding to the hypothesis in question.
On the basis of the current consumption estimation, the test system classes the consumption curves into two groups G0 and G1:                G0={curves Cs′(mr) correspond to a low consumption of the integrated circuit at the step s in question},        G1={curves Cs′(mr′) should correspond to a high consumption of the integrated circuit at the step s in question}.        
The test system then calculates the differences between the averages of the curves of the groups G0 and G1, to obtain a resulting curve, or statistical differential curve.
If a consumption peak appears in the statistical differential curve at the location chosen for the current consumption estimation, the test system deduces that the hypothesis concerning the bit dv-s value is correct. The operation executed by the modular exponentiation algorithm is thus here LIM(a,m). If no consumption peak appears, the average difference does not reveal a significant consumption difference (a signal comparable to noise is obtained), and the test system can either consider that the complementary hypothesis is verified (dv-s=0, the executed operation is LIM(a,a)), or else proceed in a similar manner to verify this hypothesis.
DPA-based test processes have the drawback of being complicated to implement and require the capture of a very high number of current consumption curves. Moreover, hardware countermeasures exist (such as the provision of a clock jitter, the generation of background noise, or the like), which often require the provision of preliminary signal processing steps (synchronization, noise reduction, and the like) on the current consumption curves used for the acquisition. The number of current consumption curves to acquire in order to obtain reliable results also depends on the architecture of the integrated circuit studied, and may be anywhere from thousands to hundreds of thousands of curves.
CPA-Based Test Processes
CPA was disclosed by E. Brier, C. Clavier, and F. Olivier., Correlation Power Analysis with a Leakage Model., Cryptographic Hardware and Embedded Systems—CHES 2004, volume 3156 of Lecture Notes in Computer Science, pages 16-29., Springer, 2004. The authors propose a linear current consumption model that supposes that the switching of a bit from 1 to 0 consumes the same amount of current as the switching of a bit from 0 to 1. The authors further propose to calculate a correlation coefficient between, on the one hand, the measured consumption points that form the captured consumption curves and, on the other hand, an estimated consumption value calculated from the linear consumption model and from a hypothesis as to which operation the integrated circuit executes.
FIGS. 4 and 5 show an example of CPA applied to the modular exponentiation algorithm. In this example, the test system looks to know whether at the sth iteration of step 3 of the modular exponentiation algorithm, the operation executed after LIM(a,a) is again LIM(a,a) (that is, step 3A of the following iteration s+1) or else LIM(a,m) (that is, step 3B of the iteration of rank s).
As shown in FIG. 4, the test system acquires M current consumption curves Cs′(mr) (Cs′(m0), Cs′(m1), . . . , Cs′(mr), . . . , Cs′(mM)) relating to the same iteration of the algorithm, each corresponding to a message mr (m0, m1 . . . mr . . . mM-1) that was sent to the integrated circuit. Each curve Cs′(mr) includes E current consumption points W0, W1, W2, . . . , W1, . . . , WE-1 forming a first subset of points. The points of a same curve Cs′(mr) are associated with a current consumption estimation.
To this end, the current consumption HW is for example modeled as follows:W=k1*H(D⊕R)+k2“R” being a reference state of the calculation register of the integrated circuit, “D” being the value of the register at the end of the operation in question, k1 being a proportionality coefficient, and k2 representing the noise and/or current consumed that is not linked to H(D⊕R). The function “H” is the Hamming distance between the values R and D of the register, that is the number of different bits between D and R (“⊕” designating the exclusive OR function).
According to a simplified approach, the reference value R of the register is chosen to be equal to 0, such that the calculation of the estimated current consumption point comes down to calculating the Hamming weight (number of bits at 1) of the result of the operation in question. This result is, for example, “a*m” for the hypothesis concerned. It results that the estimated consumption point HW is equal to H(a*m). The hypothesis about the executed operation, for example LIM(a,m), is therefore transformed into a current consumption estimation HW calculated by applying this linear consumption model.
As shown in FIG. 4, the test system then regroups the different current consumption points Wk, forming each curve Cs′, into vertical transversal subsets VEk (VE0, VE1, VE2, . . . , VEk, . . . VEE-1, each including points Wk of same rank k of each of the curves Cs′. Each vertical transversal subset VEk is shown by vertical dashed lines and contains a number of points equal to the number M of curves used for the analysis.
An estimated current consumption point HWk is associated with each point Wk of a vertical transversal subset VEk. This estimated point corresponds to the estimation of the consumption associated with the curve Cs′(mr) to which the point belongs, calculated in the manner indicated above.
For each vertical transversal subset VEk, the test system then calculates a linear vertical correlation coefficient VCk between the points Wk of the considered subset and the estimated consumption points HWk that are associated therewith. This correlation coefficient is, for example, equal to the covariance between the measured consumption points Wk of subset VEk and the estimated consumption points HWk associated with these measured consumption points, divided by the product of the standard deviations of these two sets of points. Thus, a vertical correlation coefficient VCk corresponding to the evaluated hypothesis is associated with each vertical transversal subset VEk.
As shown in FIGS. 5A, 5B, the test system thereby obtains a set of vertical correlation coefficients VC0, VC1, . . . , VCk, . . . , VCE-1 forming a vertical correlation curve VCC1 that invalidates the hypothesis or forming a vertical correlation curve VCC2 that confirms the hypothesis. The curve VCC2 presents one or more noticeable correlation peaks (normalized covariance values close to +1 or −1), thus indicating that the hypothesis about the operation is correct. The curve VCC1 does not present a correlation peak. If the correlation curve VCC2 is obtained, the test program deduces that the integrated circuit was performing LIM(a,m) when the curves Cs′(m0) to Cs′(mM-1) were acquired, and therefore deduces that the bit ds of the modular exponentiation exponent is equal to 1.
Big Mac-Based Test Processes
The Big Mac analysis was disclosed in Colin D. Walter., Sliding Windows Succumbs to Big Mac Attack., Cryptographic Hardware and Embedded Systems—CHES 2001, volume 2162 of Lecture Notes in Computer Science, pages 286-299., Springer, 2001; and Colin D. Walter., Longer keys may facilitate side channel attacks., Selected Areas in Cryptography, SAC 2003, volume 3006 of Lecture Notes in Computer Science, pages 42-57., Springer, 2003. This analysis is based on the atomicity of the above-mentioned large integer multiplication, that is to say the fact that the execution of a multiplication operation of two large integers includes the execution of a plurality of basic multiplications xi*yj of components xi and y3 of operands x and y subject of the multiplication.
A Big Mac-based test process includes steps of                combining consumption sub-curves corresponding to basic multiplications xi*yi for a fixed data xi and for a variable index j, then        calculating the average value of points of these sub-curves to obtain a resulting sub-curve that represents the properties of xi in a more apparent manner than the properties of yj,        forming a dictionary with average sub-curves, and afterwards, and        identifying, by way of the dictionary, new sub-curves issuing from following multiplications, to deduce therefrom the value of operands handled by following multiplication operations.        
Summary of Known Test Processes
As it has just been seen, test processes based on DPA and CPA require the acquisition of numerous current consumption curves. Even though CPA-based test processes are more efficient than DPA-based test processes and generally only require between a hundred and several hundred consumption curves as opposed to thousands to hundreds of thousands of curves for DPA processes, the number of curves to acquire to implement a CPA-based test process cannot be considered as negligible.
Additionally, DPA- or CPA-based test processes can be countered by countermeasures consisting of masking the message m and/or masking the exponent d using random words. Indeed, it has been seen that the hypothesis concerning the consumption linked to LIM(a,m) requires the knowledge of the message m to calculate its Hamming weight. A masking of the message using random data no longer allows for the association of an estimated consumption value with a measured consumption value to calculate the weighting coefficient.
Finally, a Big Mac-based test process is tricky to implement and requires a good knowledge of the integrated circuit architecture in order to develop a dictionary including the models required for its implementation. The results obtained have been considered as unsatisfactory and the process does not seem to be the subject of known practical applications.