1. Field of the Invention
The present invention relates to an instruction fetch, and more particularly, to an instruction fetch that requires security such as encryption.
2. Description of Related Art
In the recent advanced information society, individual information or information related to money is stored in a semiconductor device such as IC card of a credit card or electronic money, and thus a technology (tamper resistance) for protecting the information from being leaked or tampered becomes extremely important.
In the encryption currently used, the algorithm per se is published and the security thereof is sufficiently studied. However, the security when the algorithm is implemented as hardware or software is a know-how of each company that implements the algorithm, and sufficient study has not been conducted. For that reason, there is a method of acquiring security information by exploiting the weakness of the implementation. As an attack against the implementation, attention has been paid to a side channel attack in recent years and actively announced in various academic conferences.
The side channel attack is an attack that tries to obtain internal secret information from side channel information such as a power consumption, an electromagnetic wave, or a processing time during processing other than from an original communication channel. One of the side channel attacks is a timing attack. The timing attack is a method of deriving the secret information by paying attention to a fact that the processing time changes according to a calculation value. Also, as other attacking methods, there are a simple power analysis (SPA) that discriminates the information from the waveform of the power consumption, and an electric power analyzing method such as a differential power analysis (DPA) which discriminates a difference in the calculation contents by statistically processing a difference in the power consumption.
The use of those methods makes it possible to derive the secret information from a difference in the processing time or a difference in the power consumption, and thus a method of eliminating those differences becomes extremely important. For example, in the case of using a Chinese remainder theorem (CRT) in order to increase the processing speed by an RSA encryption, the calculation is as follows.X=C mod Qwhere C is a quantification of an encrypted text, and Q is a numeric value (prime number) representative of a secret key. In the case of the RSA encryption of 1024 bits, C is 1024 bits and Q is 512 bits, which are very large numeric values.
In the calculation, the calculation per se is unnecessary when the value of C is smaller than Q, and the calculation is executed to obtain a remainder of C/Q when the value of C is larger than Q. That is, the relationship of C and Q is obtained such that C>Q is satisfied when the processing time is longer whereas C<Q is satisfied when the processing time is shorter. As a result, when the calculation is executed a number of times while C is changed, the value of Q which is the secret information can be eventually found out.
This is caused not by the weakness of the algorithm per se of the RSA encryption, but by the weakness of the implementing method, which causes a difference in the processing time by the conditional branch which appears when implementing the encryption. As described above, even with the encryption that is said to be secure, there is a possibility that the secret information may leak if an inappropriate implementation is done.
The above problem can be solved by an algorithm that requires no conditional branching, which is impossible in many cases. In those cases, the weakness is eliminated by making the processing time in the case of branching equal to that in the case of no branching. For instance, an example of program when the time is simply made equal to each other is represented as follows:
If (C>Q)
{a remainder of C/Q is obtained←power consumption is larger} else {a dummy time is gained←power consumption is smaller}
As shown in the above, the difference in the power consumption is analyzed to obtain the same information as the difference in the processing time.
Accordingly, the conditions required as a countermeasure (tamper resistance) against the timing attack are considered as follows.
(1) Branching is eliminated. When branching is necessarily required, the execution time is made equal to each other regardless of branching.
(2) The power consumption is made equal to each other regardless of branching.
As a general countermeasure example in a process of obtaining the above remainder, even in the case of C<Q, a calculation of obtaining the remainder of C/Q by dummy is added, and the following conditions are satisfied. As a result, it is presumed that the calculation time and the power consumption can be made substantially equal to each other in the algorithm between C<Q and C>Q.
If (C>Q) {a remainder of C/Q is obtained} else {a remainder of C/Q is obtained, but the result is discarded (dummy calculation)}
However, in the case where the algorithm is actually implemented in a semiconductor device, the IF statement of a high level language such as C language is generally compiled in a compare instruction and a branch instruction of an assembly language. Accordingly, there occurs a shift of the execution timing due to the branch instruction or a difference in the power consumption between when branching is conducted and when branching is not conducted. That is, in the timing attack and the tamper resistance against DPA, it is important to take into consideration a case where the algorithm is implemented as software or hardware.
FIG. 9 shows an example of a circuit configuration (information processor) required for the pipeline operation. An information processor 9 shown in FIG. 9 includes a storage device 100 that stores program therein, and a central processing unit (CPU) 500 that executes program that is stored in the storage device 100. The CPU 500 includes an instruction fetch circuit 200, an instruction decode circuit 300, and an instruction processing circuit 400. In the figure, bold arrows show flows of an address value or an instruction code.
The storage device 100 stores program that is executed by the CPU 500 (plural instruction codes) therein so as to associate the respective instruction codes with addresses storing the respective instruction codes, inputs an instruction fetch address from the instruction fetch circuit 200, and outputs the instruction code to the instruction fetch circuit 200 based on the input address. Also, the storage device 100 is provided for program storage, and is not limited to a specific storage device. For example, the storage device 100 can be formed of a read only memory (ROM) not rewritable as well as a nonvolatile storage device such as a rewritable flash memory, and a RAM that requires the load of program after a power supply turns on.
The instruction fetch circuit 200 determines (selects) an address (fetch address) at which an instruction code to be subsequently executed is stored, and reads the instruction code that is stored at the determined fetch address from the storage device 100. More specifically, the instruction fetch circuit 200 outputs the selected fetch address to the storage device 100, inputs the instruction code that is output from the storage device 100 based on the output address, and stores the input instruction code in a register (instruction queue 220). Hereinafter, in the specification of the present invention, the instruction code is merely called “instruction,” and the fetch of the instruction code is merely called “instruction fetch.”
The instruction decode circuit 300 decodes the instruction code that is output from the instruction fetch circuit 200.
The instruction processing circuit 400 executes the instruction that is decoded by the instruction decode circuit 300.
Hereinafter, the operation will be described based on the presence or absence of the branch instruction.
In the case where the instructions other than the branch instruction are continuously executed, an increment circuit 204 calculates an address of a subsequent instruction. Also, both an absolute branch signal 210 and a relative branch signal 211 are inactive. For that reason, an address select circuit 202 selects an output of the increment circuit 204. An address holding circuit 201 updates an address (fetch address) that is held by the output selected by the address select circuit 202. In this situation, the instruction select circuit 230 selects an output of the storage device 100 as a successive instruction, and stores the output in the instruction queue 220. Also, in the case where the branch instruction is detected by the instruction decode circuit 300, and no branch occurs, both of the absolute branch signal 210 and the relative branch signal 211 are inactive. For that reason, as in the case of other than the branch instruction, the address select circuit 202 selects an output of the increment circuit 204, and updates the address holding circuit 201.
Subsequently, in the case where the absolute branch instruction is detected as an unconditional branch instruction by the instruction decode circuit 300, the absolute branch signal 210 becomes active, which indicates that the absolute branch is conducted. As a result, the address select circuit 202 selects the address (absolute address) that is supplied by the branch address signal 206.
The address holding circuit 201 updates an address that is held by the address selected by the address select circuit 202. Also, the instruction select circuit 230 selects an output of a NOP instruction code generating circuit 231. The instruction queue 220 discards the stored instruction, holds the output of the NOP instruction code generating circuit 231 which is selected by the instruction select circuit 230, and outputs the held output to the instruction decode circuit 300. The instruction decode circuit 300 inputs the instruction that is held by the instruction queue 220. The instruction select circuit 230 selects the output of the NOP instruction code generating circuit 231 in the case where any one of the absolute branch signal 210 and the relative branch signal 211 is active. Accordingly, in the case where the absolute branch signal is selected, because the absolute branch instruction signal 210 becomes active, a nop instruction is inserted in an instruction sequence.
Subsequently, when the relative branch instruction is detected by the instruction decode circuit 300, and the branch occurs, the relative branch signal 211 becomes active, which indicates that the relative branch is conducted. An address adder 205 adds a value (relative address) that is supplied by the branch address signal 206 and a value of the present address signal 203 that outputs the present address that is held by the address holding circuit 201 to calculate the branched address. The address select circuit 202 selects an address that is calculated by the address adder 205. The address holding circuit 201 updates the address that is held by the address selected by the address select circuit 202. Also, the instruction select circuit 230 selects the output of the NOP instruction code generating circuit 231. The instruction queue 220 discards the stored instruction, holds the output of the NOP instruction code generating circuit 231 that is selected by the instruction select circuit 230, and outputs the held output to the instruction decode circuit 300. The instruction decode circuit 300 inputs the instruction that is held by the instruction queue 220. In this way, when the relative branch instruction is selected, the NOP instruction is inserted in the instruction sequence because the relative branch instruction signal 211 becomes active.
An example in which the branch instruction (IF statement) is described in the program that is executed by the hardware shown in FIG. 9 is shown in FIG. 10. A left side of FIG. 10 shows a case in which the branch instruction is described in C language, and a right side of FIG. 10 shows an assembly language source that is obtained by compiling the C language. In this example, for simplification, an instruction that is executed when the conditions are true and an instruction that is executed when the conditions are false are compiled in one instruction.
FIG. 11 is a flowchart showing an operation when an instruction of program shown in FIG. 10 is executed. A command of the instruction of program at the right side shown in FIG. 10 is described in a corresponding process of FIG. 11. First, cmp compares X and Y (S11). Then, jne compares X with Y, and determines the presence or absence of the branch (S12). When X=Y (not branched in S12), an instruction A that has been already subjected to the instruction fetch is executed (S13). Then, unconditional branch is conducted by jbr (S14), and the branched address is subjected to the instruction fetch (S15). On the other hand, when X≠Y (branched in S12), the instruction is fetched in order to acquire the instruction of the branched address (LABEL1) (S16), ‘and an’ instruction B is executed (S17). Then, a subsequent instruction after the respective processes subsequent to the conditional branch instruction have been completed is executed (S18).
FIG. 12 shows the pipeline operation in the case where program shown in FIG. 10 is executed by hardware shown in FIG. 9. A case of branching and a case of no branching are arranged in a state where execution time is justified. An upper stage of FIG. 12 shows the operation when no branch is conducted, and a lower stage of FIG. 12 shows the operation when a branch is conducted. The meanings of the respective stages of the pipeline operation are that IF denotes an instruction fetch, ID is an instruction decode, EX is the execution of a decoded instruction, MEM is a memory access, and WB is the write of the execution results in the register, respectively. The instruction fetch means that the instruction code is fetched in the instruction queue 220, and the instruction code is fetched from the storage device 100 shown in FIG. 9 or the NOP instruction code generating circuit 231 through the instruction select circuit 230.
The upper stage of FIG. 12 is a case in which no conditional branch occurs, and a subsequent instruction A is executed immediately after the conditional branch instruction jne is executed, thereafter branched to the address of the subsequent instruction by the absolute branch instruction jbr, and the subsequent instruction is executed. The operation timing chart of hardware in this situation is shown in FIG. 13. The number at the left end of FIG. 13 (210 or 201) corresponds to the number of structural element shown in FIG. 9. Also, a time T indicated by the lowest stage corresponds to the time shown in FIG. 12.
When it is assumed that the initial value of the address holding circuit 201 is 0x0000 address, cmp instruction (compare instruction) is taken in the instruction queue 220. Because this instruction is not a branch instruction, the address select circuit 202 selects a value obtained by incrementing the value of the present address signal 203 by the increment circuit 204. At time T=1, the address holding circuit 201 is updated by a value selected by the address select circuit 202. The same operation is conducted till time T=2. When the jbr instruction is detected by the instruction decode circuit 300 at time T=3, the absolute branch signal 210 becomes active at time T=4, and the following operation is conducted.
The instruction select circuit 230 selects the output of the NOP instruction code generating circuit 231 (nop instruction). Also, the instruction queue 220 discards the subsequent instruction that has been already fetched and stored, holds the output (nop instruction) that has been selected by the instruction select circuit 230, and supplies the selected output to the instruction decode circuit 300. The instruction decode circuit 300 executes the nop instruction that has been supplied from the instruction queue 230. In this situation, the address adder 205 adds a value of the present address signal 203 that has been output from the address holding circuit 201 with a value of the branch address signal 206. At T=5, the address holding circuit 201 is updated by the output value of the address adder 205 which is selected by the address select circuit 202. In the example of FIG. 13, because no branch occurs in the conditional branch instruction of time T=1, the instruction A is immediately executed at time T=2 immediately after T=1.
On the other hand, the lower stage of FIG. 12 shows a case in which the conditional branch occurs, and a phase that again fetches the branched instruction is inserted because the branch occurs in the conditional branch instruction jne, and thereafter the instruction B and the subsequent instruction are executed. The operation timing chart at that time is shown in FIG. 14. The arrangement of FIG. 14 is the same as that of FIG. 13.
In the example of the lower stage of FIG. 12, because branch occurs in the conditional branch instruction of time T=1, the absolute branch signal becomes active at T=2 as in the case where branch occurs in the above-mentioned jbr instruction. In the instruction queue 220, the instruction A that has been already fetched and stored is discarded and changed to the nop instruction. In this situation, the address adder 205 adds a value of the present address signal 203 that has been output from the address holding circuit 201 and a value of the branch address signal 206 together. At time T=3, the address holding circuit 201 is updated by the output value of the address adder 205 which is selected by the address select circuit 202, and the instruction B of the branched address is fetched.
Also, JP 2003-502905 A discloses an example using a part of the encryption calculation. In the originally required calculation, when a value of the bit included in a key is “1,” the result of the Montgomery multiplication is stored in a portion to be originally stored. When the value of the bit included in the key is “0,” nothing is conducted, and therefore a time difference in the calculation occurs.
Under the above circumstances, in JP 2003-502905 A, when the value of bits included in the key is “1,” the result of the Montgomery multiplication is stored in a portion to be originally stored, and if not, dummy calculation that the result of the Montgomery multiplication is stored (discarded) in a temporal region is added so that the time difference between a case where the value of bits is “1” and a case where the value of bits is “0” is tried to be eliminated. Further, JP 2006-11723 A discloses a technology related to a method of controlling the branch for subjecting the instruction to pipeline processing.
However, the prior art suffers from a problem on processing and a problem on the circuit as described below. As is apparent from FIGS. 12, 13, and 14, in the prior art, when no branch occurs in the branch instruction of time T=1, the subsequent instruction is executed at time T=5, and when branch occurs, the subsequent instruction is executed at time T=4. As described above, timing at which the subsequent instruction is executed is different depending on the presence or absence of branch, and there occurs a problem on the processing in that there is the possibility that the operation is analyzed from the timing difference.
Also, as is apparent from FIG. 12, after the conditional branch instruction jne is conducted, the instructions that are executed simultaneously are different between a case where branch is conducted and a case where no branch is conducted. Therefore, the operating circuits are also different from each other. For that reason, there occurs the problem on the circuit in that the power consumption is different between those cases.
There is a possibility that the secret information is decrypted by conducting the side channel attack by using the side channel information such as the shift of timing or the difference in the power consumption. In JP 2006-11723 A, the difference in timing and the difference in the power consumption are not considered because the optimization and high speed of the pipeline operation are intended. On the other hand, in JP 2003-502905 A, the dummy calculation or the dummy memory storage is conducted for the purpose of eliminating the difference in the timing and power consumption. However, the conditional statement (IF statement) required in this situation is compiled into the conditional branch instruction by a compiler, and the occurrence of a difference between the case where branch is conducted and the case where no branch is conducted is not considered.
As described above, in implementing encryption algorithm or the like in a computer, it is difficult to align the timing at which the instruction is executed regardless of the presence or absence of branch in the case of including the conditional branch instruction.