As for MISTY1, which is one of common key cryptographic methods (as for the details, see Matsui Mitsuru, “Block Encryption Algorithm MISTY”, IEICE Technical Report, Vol. 96, No. 167, ISEC96-11, Jul. 22, 1996 or the like.), various implementation methods are considered.
FIG. 1 depicts one example of a configuration for the common key cryptographic method such as MISTY1. The common key cryptographic method relating to MISTY1 includes a round processing and an expanded key generation processing. As depicted in FIG. 1, in the expanded key generation processing, plural expanded keys (in FIG. 1, an expanded keys 0, 1, . . . N) are generated from an input secret key. The generated expanded keys are used in the encryption processing (also called the “round processing”). In the encryption processing, text data (i.e. data to be encrypted) is divided into blocks by a predetermined bit length (i.e. block length), and the round processing is carried out for each of the generated blocks to generate the encrypted text. At the decryption, an inverse calculation of the encryption processing is carried out.
The common key cryptographic method MISTY1 is an algorithm whose length of the secret key is 128 bits and whose block length of the encryption is 64 bits.
FIG. 2 depicts a configuration of the round processor in MISTY1. As depicted in FIG. 2, in the round processor to convert text data P (64 bits) into encrypted text data C (64 bits), an FL function is executed 10 times and an FO function is executed 8 times.
The i-th FO function has a configuration as depicted in FIG. 3. KOi1, KOi2, KOi3 and KOi4 (which are respectively 16 bits) are inputted into the FO function. These are four of K1 to K8, which are generated by dividing, by 16 bits, the 128-bit secret key. Which of K1 to K8 is selected is determined according to the algorithm specification based on the round value i (i.e. a value “i” of FOi).
In addition, in the i-th FO function, the FI function is executed three times. Then, KIi1 is inputted to an FIi1 function, KIi2 is inputted into an FIi2 function, and KIi3 in inputted into an Fi3 function. KIi1 to KIi3 are 16-bit values and three of K′1 to K′8, which are generated by an expanded key generation algorithm. Which of K′1 to K′8 is selected is determined according to the algorithm specification based on the round value i (i.e. a value “i” of FOi).
FIG. 4 depicts a configuration of the j-th FI function in the i-th FO function. In the FI function, upper 9 bits in the 16-bit input are inputted to a non-linear function S9 (a function to scramble the input data according to a predetermined algorithm (repeat of the logical computation) and output the scrambled data), and an output of the function S9 and a value, in which two “0” are added (denoted “0-extension”) as upper 2 bits to lower 7 bits in the 16-bit input, are exclusively ORed to generate data “a”. In addition, the lower 7 bits in the 16-bit input are inputted to a non-linear function S7 (a function to scramble the input data according to a predetermined algorithm (repeat of the logical computation) and output the scrambled data), and an output of the function S7 and a value, in which upper 2 bits in the data “a” is removed (denoted “truncate”), are exclusively ORed to generate data “b”. Furthermore, the data “b” and KIijL (i.e. upper 7 bits of KIij) are exclusively ORed to generate data “c”. Moreover, the data “a” and KIijR (i.e. lower 9 bits of KIij) are exclusively ORed to generate data “d”. The data “d” is inputted into the non-linear function S9 again, and further the output of the function S9 and a value, in which two “0” are added as upper 2 bits to the data “c” are exclusively ORed to generate data “e”. Then, finally, when the data “c” is arranged in the upper 7 bits, the data “e” is arranged in the lower 9 bits and they are concatenated, 16-bit output is obtained.
Next, FIG. 5 depicts a configuration of an expanded key generator of MISTY1. In the expanded key generator, 128-bit secret key is divided by 16 bits to generate K1 to K8 from the most significant bit. As depicted in FIG. 5, the expanded key K′8 is generated by the FI function using K8 as an input and K1 as KIij. The expanded key K′7 is generated by the FI function using K7 as an input and K8 as KIij. The expanded key K′6 is generated by the FI function using K6 as an input and K7 as KIij. The expanded key K′5 is generated by the FI function using K5 as an input and K6 as KIij. The expanded key K′4 is generated by the FI function using K4 as an input and K5 as KIij. The expanded key K′3 is generated by the FI function using K3 as an input and K4 as KIij. The expanded key K′2 is generated by the FI function using K2 as an input and K3 as KIij. The expanded key K′1 is generated by the FI function using K1 as an input and K2 as KIij.
Thus, when MISTY1 is implemented by software or hardware, the implementation method of the FI function is one of problems. This is because the FI function is used in both of the round processor and expanded key generator, and if it is possible to efficiently execute the FI function, the performance of the MISTY1 is largely improved.
Some conventional implementation methods of the FI function are described in Japanese Patent No. 3917357.
FIGS. 6 and 7 depict a first implementation example disclosed in the aforementioned Japanese patent. In this implementation example, after the algorithm in FIG. 4 is equivalently converted into an algorithm as depicted in FIG. 6, a processing 1001 of the non-linear function S9, a processing 1003 of the non-linear function S7 and a processing including the non-linear function S9 are tabulated. However, the processing 1001 is different from the processing 1005. As a result, as depicted in FIG. 7, the processing 1001 is replaced with a table T1, the processing 1003 is replaced with a table T4 and the processing 1005 is replaced with a table T5. These tables are stored in a Read Only Memory (ROM), and are referenced if necessary.
Incidentally, as an example, as for the FI function using K′1, KIijR and KIijL′ are generated as follows:    KIijR=K′1 & 0x1FF    tmpk1=K′1 & 0xFE00    tmpk2=KIijR & 0x7F    tmpk3=tmpk2<<9    tmpk4=tmpk3+tmpk1    tmpk5=tmpk4>>9    KIijL′=tmpk5+tmpk4
The tables T1, T4 and T5 are defined as follows:
Incidentally, X represents an input. In addition, a table entry is generated for all possible X values.T1(X)=S9(X)T5(X)=((X&0x7F)<<9)+(X&0x7F)+S9(X)T4(X)=(S7(X)<<9)+S7(X)
“<<9” means shifting to left, “>>9” means shifting to right, and “X&0x7F” means extracting lower 7 bits of X.
In such an implementation method, the size of the table T1 is 1 KB, the size of the table T4 is 1 KB, the size of the table T5 is 256B, and total 2304B in the ROM are used. However, no Random Access Memory (RAM) is used.
In addition, in this implementation example, 9 cycles are required for one FI function, and 24 FI functions are used. Therefore, total 216 cycles are required for the entire round processing.
On the other hand, in the expanded key generation processing, 7 cycles for a preprocessing of data corresponding to KIijR and KIijL′, 9 cycles for the FI function and 7 cycles for a processing to generate KIijR and KIijL′ for the round processing for K′i are required for the respective 8 FI functions. Therefore, for the entire expanded key generation processing, 184 cycles (=(7+9+7)*8) are required.
Here, the processing time for the round processing is calculated as “(the number of cycles for one FI function)*8”. Incidentally, cycles for the FL functions other than the FI functions and exclusive OR (XOR) in the FO function and the expanded key are required for the round processing. However, because the number of required cycles is less and the latency is small, they are excluded from the estimate of the processing time.
Furthermore, FIGS. 8 and 9 depict a second implementation example disclosed in the aforementioned patent publication. As depicted in FIG. 8, in the first implementation example depicted in FIG. 7, the exclusive OR with KIijR and a portion 1101 of the table T5 are tabulated. Namely, as depicted in FIG. 9, a table T5j is introduced.
However, KIijR is data generated based on the expanded K′i, and when the user inputs the secret key, the value is identified for the first time. Therefore, the table T5j cannot be calculated before the user inputs the secret key, and the table T5j is generated after the input of the secret key. Namely, the table T5j cannot be held on ROM, and RAM is used.
The tables T1 and T4 are the same as the aforementioned tables, and are stored in ROM after calculation is carried out for all possible values of X in advance. On the other hand, the table T5j is prepared according to a following expression. However, after the user inputs the secret key and calculation is carried out for all possible input patterns, the table T5j is stored into RAM.T5j(X)=(((X+KIijR)&0x7F)<<9)+((X+KIijR)&0x7F)+S9(X)
In such an implementation method, the size of the table T1 is 1 KB, the size of the table T4 is 256B, and ROM whose size is total 1280B is used. In addition, because the table T5j is held on RAM, the size of RAM is 1 KB.
In this implementation method, 8 cycles are required for one FI function, and because 24 FI functions exist, 192 cycles are required for the entire round processing.
On the other hand, in the expanded key generation processing, the generation of the table T5j is carried simultaneously. 1536 cycles or more are required for the generation of this table, and when the cycles required for other portion of the expanded key generation processing are added, 1600 cycles or more are required for the entire processing.
Furthermore, FIGS. 10 and 11 depict a third implementation example disclosed in the aforementioned patent publication. As described in FIG. 10, in the second implementation example depicted in FIG. 9, the exclusive OR with KIijL′ and a portion 1201 of the table T4 are tabulated. Namely, as depicted in FIG. 11, a table T4j is introduced.
However, KIijL′ is data generated based on the expanded key K′i, and the value of KIijL′ is identified after the user inputs the secret key into the cryptographic apparatus. Therefore, it is impossible to calculate the table T4j before the user inputs the secret key, and the table T4j is prepared after the input of the secret key. Namely, the table T4j cannot be held on ROM, and is held on RAM.
The table T1 is the same as the aforementioned table, and all of the possible values are calculated in advance and recorded onto the ROM. As described above, the table T5j is held on RAM. Furthermore, data stored on the table T4j is calculated using a following expression. However, after the user inputs the secret key and values are calculated for all possible input patterns, the table T4j is held on RAM.T4j(X)=(S7(X)<<9)+S7(X)+KIijL 
In such an implementation method, the size of the table T1 is 1 KB, the size of the table T4j is at least 128B for the preprocessing, and the total size of ROM is 1152B or more. On the other hand, the tables T4j and T5j are held on RAM, and the size is 1280B.
Furthermore, in this implementation example, 7 cycles is required for one FI function, and 24 FI functions exist. Therefore, 168 cycles are required for the round processing.
On the other hand, in the expanded key generation processing, the generation of the tables T4j and T5j is simultaneously carried out. 1920 cycles or more are required for the generation of this table, and when the other expanded key generation processing is included, 2000 cycles or more are required.
Furthermore, a paper (Nakajima Junko and Matsui Mitsuru, “Fast Implementation of MISTY in Software (II)”, SCIS98-9.1B) discloses another implementation method. This method is explained by using FIG. 12. In this implementation method, the FI function depicted in FIG. 4 is equivalently converted into a form depicted in FIG. 12. In the example of FIG. 12, an upper portion including the non-linear functions S9 and S7 is converted into a table T7, and a lower portion other than the exclusive OR with KIij, which includes the non-linear function S9, is converted into a table T8.
In such an implementation example, the size of the table T7 is 131072B, the size of the table T8 is 131072B and the total table size is 262144B. Incidentally, RAM is not used.
In such an implementation example, 3 cycles are required for one FI function, and because 24 FI functions exist, 72 cycles are required for the round processing. Because KIij is used as it is, 24 cycles are required for the expanded key generation processing, due to 8 FI functions.
Because MISTY1 is implemented into an embedded device, it is desired that the consumed capacity of RAM is less, the size of ROM is less and the processing speed is high. Especially, it is desired that the consumed capacity of RAM is as less as possible, and a method storing a calculation table prepared in advance into RAM is not suitable for the embedded device environment. In addition, it is desired that the size of ROM is as less as possible. However, when the table stored in ROM is reduced, the processing speed is rapidly lowered and the processing speed becomes insufficient.