In a block cipher which encrypts input data by executing a data conversion process on the input data on a one-block-at-a-time basis, a hash function or the like, high data scrambling capability for input data is desired. For example, a process in which input data is divided into data blocks of a fixed size such as bytes, and various operations such as linear conversion and nonlinear conversion are repeatedly executed to scramble the data while byte data blocks affect one another is performed.
For example, the AES (Advanced Encryption Standard) algorithm known as an U.S. encryption standard is an algorithm for scrambling data by dividing input data into byte data blocks, arranging the byte data blocks in a square or rectangular matrix, and repeating various processes such as processing on a one-row-at-a-time basis or processing on a one-column-at-a-time basis, more specifically, a nonlinear conversion process and a linear conversion process.
A specific example will be described referring to FIG. 1. In the case where data to be subjected to a conversion process is 8×16=128-bit data, as illustrated in FIG. 1(a), a square matrix is configured of byte data blocks a1, a2, . . . , a16 as one-byte data blocks containing 8 bits, and data conversion is performed by repeating various operation processes on data blocks such as:
an operation on a one-row-at-a-time basis, for example, an operation process on each row such as (a1, a2, a3, a4), or
an operation on a one-column-at-a-time basis, for example, an operation process on each column such as (a1, a5, a9, a13),
more specifically various processes such as a nonlinear conversion process, a linear conversion process, a shift process and an exclusive OR operation with a key.
As illustrated in FIG. 1(a), it is known that when a process on a one-row-at-a-time basis or on a one-column-at-a-time basis is executed on a square matrix in which byte data blocks are arranged, efficient scrambling is achievable. However, a square matrix containing one-byte data blocks is allowed to be configured only in the case where input data to be subjected to a conversion process is data of a specific number of bits such as data of 8×16=128 bits illustrated in FIG. 1(a). More specifically, the square matrix is configured only in the following case:
the case of the number of bits=8×(n)2 bits (where n is a natural number),
in byte terms,
the case of the number of bytes=(n)2 bytes (where n is a natural number).
128 bits are equal to a bit number=8×(4)2 bits, and as illustrated in FIG. 1(a), 128 bits are allowed to configure a square matrix containing 4×4=16 one-byte data blocks.
However, in the case where data to be subjected to conversion is, for example, 256 bits, 256=8×32 is established, that is, 256 is not represented as 256=8×(n)2 bits, so it is impossible to configure a square matrix contacting byte data blocks.
In such a case, as illustrated in FIG. 1(b), 32 byte data blocks a1, a2, a3, . . . , a32 containing 8 bits are arranged in a rectangular matrix with an aspect ratio of 1:2, and scrambling is executed by repeating a process on a one-block-at-a-time basis or a process on a one-column-at-a-time basis on the rectangular matrix. However, there is such an issue that even if scrambling is performed on the rectangular matrix illustrated in FIG. 1(b) in the same steps as those in the case of the square matrix, in spite of an increase in time and effort for operations, scrambling capability is not improved.
Referring to FIG. 2 and later drawings, scrambling process examples in the cases of a square matrix (a square state) containing byte data blocks and a rectangular matrix (a rectangular state) containing byte data blocks will be described below.
(A) Processing Example on Square Matrix (Square State)
Referring to FIG. 2 and later drawings, a scrambling process in a data conversion process on data of 128 bits will be described below. The data of 128 bits are divided into one-byte (8-bit) data blocks. Herein, 16 one-byte data blocks are indicated by a1 to a16, respectively.
As illustrated by a square matrix of data (a square state) 11 in FIG. 2, 16 one-byte data blocks [a1 to a16] are stored in a 4×4 matrix. Hereinafter, data stored in the square matrix is called as square state.
In the AES block cipher algorithm, a plurality of operations on the square state are defined, and encryption is achieved by repeatedly applying the defined operations. The operations defined in the AES include the following four kinds illustrated in FIG. 2.
(1) Nonlinear Conversion Process (SUB)
An operation of updating a value by subjecting each one-byte data block to nonlinear conversion S(x) on a one-byte-at-a-time basis,
where as illustrated in FIG. 2(1), a relationship between a one-byte output bi after the conversion process and a one-byte input ai is:bi=S(ai)i=1,2, . . . ,16
For example, in the AES cipher, the operation corresponds to nonlinear conversion using an S-box.
(2) Shill Process (SHIFT)
A process of subjecting each row to a rotation shift operation. Shift amounts vary from one row to another, and in the case of the AES, as illustrated in FIG. 2(2), one-byte data blocks in a first row are not rotationally shifted, and one-byte data blocks in a second row, one-byte data blocks in a third row and one-byte data blocks in a fourth row are rotationally shifted toward the right by one one-byte data block, two one-byte data blocks, and three one-byte data blocks, respectively.
(3) Linear Conversion Process (MAT)
An operation of updating a value by an operation on a 4×4 matrix [M] assuming that four one-byte data blocks in each column are considered as a vector.
A relationship between a one-byte output bi after the conversion process and a one-byte input ai is:t(bi,bi+4,bi+8,bi+12)=Mt(ai,ai+4,ai+8,ai+12)i=1,2,3,4
In addition, t( ) indicates a transposed matrix formed by interchanging rows and columns in a matrix. That is, the above-described expression means as follows.
                              (                                                                      b                  i                                                                                                      b                                      i                    +                    4                                                                                                                        b                                      i                    +                    8                                                                                                                        b                                      i                    +                    12                                                                                )                =                  M          ⁡                      (                                                                                a                    i                                                                                                                    a                                          i                      +                      4                                                                                                                                        a                                          i                      +                      8                                                                                                                                        a                                          i                      +                      12                                                                                            )                                              [                  Mathematical          ⁢                                          ⁢          Expression          ⁢                                          ⁢          1                ]            
(4) Key Application Operation Process (KADD)
An operation of performing an exclusive OR operation between each one-byte data block and a round key [ki] outputted from a key schedule section.
A relationship between a one-byte output bi after the conversion process and a one-byte input ai is:bi=ai(XOR)ki i=1,2, . . . ,16
In addition, in the above-described expression, (XOR) indicates an exclusive OR operation.
A combination of the above-described operations (1) to (4) which are executed in a predetermined sequence configures one round operation. The round operation is repeatedly executed on input data to produce output data, for example, encrypted data, and then output the data. As illustrated in FIG. 3, the round operation is configured of for example, a combination of data conversion processes which are executed in order of (1) the nonlinear conversion process (SUB)→(2) the shift process (SHIFT)→(3) the linear conversion process (MAT)→(4) the key application operation process (KADD), and the round operation is repeatedly executed a plurality of times to convert input data into output data, that is, encrypted data.
FIG. 4 is an illustration for describing a data scrambling example in the case where first to third rounds (R1 to R3) of the round operation configured of data conversion processes, which are executed in order of (1) the nonlinear conversion process (SUB)→(2) the shift process (SHIFT)→(3) the linear conversion process (MAT)→(4) the key application operation process (KADD), are executed on a square state.
FIG. 4 indicates which one-byte data blocks included in the square state a one-byte data block 31 at the top left corner of a square state 21 in an initial state affects by (1) the nonlinear conversion process (SUB), (2) the shift process (SHIFT), (3) the linear conversion process (MAT) and (4) the key application operation process (KADD) in each round operation. That is, FIG. 4 illustrates how an influence of constituent bits of the one-byte data block 31 in the square state is diffused to other one-byte data blocks.
Refer to the one-byte data block 31 (marked with black) at the top left of the square state 21 in an initial state of input data. The one-byte data block 31 does not affect operation results of other one-byte data blocks in the square state until the nonlinear conversion process (SUB) and the shift process (SHIFT) in the first round (R1).
However, when the linear conversion process (MAT) in the first round is completed, the one-byte data block 31 affects four one-byte data blocks included in a leftmost column of the square state. It is said that this state is a state where the influence of constituent bits of the one-byte data block 31 at the top left is diffused to four one-byte data blocks included in the leftmost column of the square state.
After that, the influence is not diffused any further until the key application operation (KADD) and the nonlinear conversion process (SUB) in the second round, but four one-byte data blocks vertically aligned are laterally diffused by the next shill process (SHIFT), thereby one one-byte data block affected by the one-byte data block 31 is included in each column.
Then, the influence is diffused to all 16 one-byte data blocks configuring the square state by the linear conversion process (MAT) immediately after the shift process.
In this case, by the processes of two rounds of the round operation, one one-byte data block affects all one-byte data blocks configuring the square state. In addition, in FIG. 4, the influence of the one-byte data block 31 at the top left is described as an example, but any one-byte data block at an arbitrary position of the square state affects other one-byte data blocks in the same manner, and an influence of a one-byte data block affects all other one-byte data blocks in two rounds, that is, the influence of a one-byte data block is diffused to all other one-byte data in two rounds. A high-speed extensive diffusion process proves high data scrambling capability, and is used as an element of encrypted data concealment or efficiency evaluation.
In an example illustrated in FIG. 4, it takes two rounds for one one-byte data block to affect all one-byte data blocks configuring the square state. Operation cost of affecting the whole square state is estimated. It takes two rounds to affect the whole square state, so two nonlinear conversion processes (SUB), two shift processes (SHIFT), two linear conversion processes (MAT) and two key application operation processes (KADD) are necessary.
As an indicator, it is considered that a hardware gate count necessary for an operation represents essential complexity. In this case, a shift process (SHIFT) operation is achievable only by connection of a circuit, so it is not necessary to pass through a gate, so the operation cost for the shift process (SHIFT) is considered 0.
Therefore, in the square state illustrated in FIG. 4, it may be estimated that operation cost necessary for two rounds of the round operation until one one-byte data block affects all one-byte data blocks configuring the square state is as follows:2SUB+2MAT+2KADD
In addition, to execute these operation processes, a logical circuit, a processing program or the like is used, and a necessary arithmetic circuit or processing speed depends on the configuration of the logical circuit, the processing program or the like. Therefore, it is difficult to evaluate absolute efficiency, but the number of gates in a logical circuit necessary for the above-described operations may be used as an evaluation indicator.
As a logical circuit implementation example, the number of gates necessary for each operation corresponds to the following number of gates:SUB operation=approximately 3,200 to 4,800 gatesMAT operation=approximately 800 to 1,200 gatesKADD operation=approximately 320 gates
Therefore, in the example illustrated in FIG. 4, it may be determined that the operation cost necessary for two rounds of the round operation until the one one-byte data block affects all one-byte data blocks configuring the square state is the following calculation cost:2SUB+2MAT+2KADD=9,000 to 13,000 gates=9K gates to 13K gates
Lower calculation cost allows downsizing of a necessary circuit size for a device executing cryptographic processing, hash processing or the like, and high-speed processing.
(B) A Processing Example on a Rectangular Matrix (a Rectangular State)
Next, referring to FIG. 5 and later drawings, a scrambling process in a data conversion process on data of 256 bits will be described below. Hereinafter, a scrambling example in conversion processes in an algorithm [Rijndael] having a similar design principle to the AES will be described.
The data of 256 bits is divided into one-byte (8-bit) data blocks. Herein, 32 one-byte data blocks are indicated by a1 to a32, respectively. As illustrated by a rectangular matrix of data (a rectangular state) 51 in FIG. 5, 32 one-byte data blocks [a1 to a32] are stored in a 4×8 matrix. Hereinafter, data stored in the rectangular matrix is called rectangular state.
In the algorithm [Rijndael], operations extended to apply the nonlinear conversion process (SUB), the shift process (SHIFT), the linear conversion process (MAT) and the key application operation process (KADD), which are used in the square state and previously described referring to FIGS. 2 to 4, to the rectangular state are defined as below.
The operations defined in the [Rijndael] algorithm include the following four kinds illustrated in FIG. 5.
(1) Nonlinear Conversion Process (W-SUB)
An operation of updating a value by subjecting each one-byte data block to nonlinear conversion S(x) on a one-byte-at-a-time basis,
where as illustrated in FIG. 5(1), a relationship between a one-byte output bi after the conversion process and a one-byte input ai is:bi=S(ai)i=1,2, . . . ,32
(2) Shift Process (W-SHIFT)
A process of subjecting each row to a rotation shift operation. Shift amounts vary from one row to another, and in the case of the Rijndael, as illustrated in FIG. 5(2), one-byte data blocks in a first row are not rotationally shifted, and one-byte data blocks in a second row, one-byte data blocks in a third row and one-byte data blocks in a fourth row are rotationally shifted toward the right by one one-byte data block, three one-byte data blocks, and four one-byte data blocks, respectively.
(3) Linear Conversion Process (W-MAT)
An operation of updating a value by an operation on a 4×4 matrix [M] assuming that four one-byte data blocks in each column are considered as a vector.
A relationship between a one-byte output bi after the conversion process and a one-byte input ai is:t(bi,bi+8,bi+16,bi+24)=Mt(ai,ai+8,ai+16,ai+24)i=1,2,3, . . . ,8
In addition, t( ) indicates a transposed matrix formed by interchanging rows and columns in a matrix.
(4) Key Application Operation Process (W-KADD)
An operation of performing an exclusive OR operation between each one-byte data block and a round key [ki] outputted from a key schedule section.
A relationship between a one-byte output bi after the conversion process and a one-byte input ai is:bi=ai(XOR)ki i=1,2, . . . ,32
In addition, in the above-described expression, (XOR) indicates an exclusive OR operation.
A combination of the above-described operations (1) to (4) which are executed in a predetermined sequence configures one round operation. The round operation is repeatedly executed on input data to produce output data, for example, encrypted data, and then output the data. As illustrated in FIG. 6, the round operation is configured of, for example, a combination of data conversion processes which are executed in order of (1) the nonlinear conversion process (W-SUB)→(2) the shift process (W-SHIFT)→(3) the linear conversion process (W-MAT)→(4) the key application operation process (W-KADD), and the round operation is repeatedly executed a plurality of times to convert input data into output data, that is, encrypted data.
FIG. 7 is an illustration for describing a data scrambling example in the case where first to third rounds (R1 to R3) of the round operation configured of data conversion processes, which are executed in order of (1) the nonlinear conversion process (W-SUB)→(2) the shift process (W-SHIFT)→(3) the linear conversion process (W-MAT)→(4) the key application operation process (W-KADD), are executed on a rectangular state.
Refer to the one-byte data block 71 (marked with black) at the top left of a rectangular state 61 in an initial state of input data. As in the case of the above-described square state, it is apparent that after two rounds, the one-byte data block 71 affects 16 one-byte data blocks. Moreover, it is apparent that an affected range is expanded by the shift process (W-SHIFT) in the third round, and the one-byte data block 71 affects all 32 one-byte data blocks by the linear conversion process (W-MAT) in the third round immediately after the shift process (W-SHIFT).
In this case, by the processes of three rounds of the round operation, one one-byte data block affects all one-byte data blocks configuring the rectangular state. In addition, in FIG. 7, the influence of the one-byte data block 71 at the top left is described as an example, but any one-byte data block at an arbitrary position of the rectangular state affects other one-byte data blocks in the same manner, and an influence of a one-byte data block affects all other one-byte data blocks in three rounds, that is, the influence of a one-byte data block is diffused to all other one-byte data in three rounds.
Next, as in the case of the previous example of the square state (refer to FIG. 4), operation cost in terms of the number of gates is calculated. In the example illustrated in FIG. 7, as operation cost necessary for three rounds of the round operation until one one-byte data block affects all one-byte data blocks configuring the rectangular state, three times of W-SUB, W-SHIFT, W-MAT and W-KADD operations are necessary. In addition, as described above, the shift process (W-SHIFT) may be considered 0; therefore it can be estimated that the operation cost in this case is as follows:3(W-SUB)+3(W-MAT)+3(W-KADD).
The operation costs of W-SUB, W-MAT and W-KADD are twice as high as the operation costs of SUB, MAT and KADD, respectively. Therefore, in the rectangular state, when the operation cost necessary for three rounds of the round operation until the one one-byte data block affects all one-byte data blocks configuring the rectangular state is 26K gates to 38K gates by calculation based on the numbers of gates described in a previous paragraph.
As described above, in the case where as described referring to FIGS. 2 to 4, data as one-byte data blocks are arranged in a square matrix to perform a round operation, scrambling is achievable with relatively low calculation cost, but in a process using a rectangular matrix as described in FIGS. 5 to 7 designed to correspond to an input/output of 256 bits which are not allowed to be arranged in a square matrix, an issue of an increase in calculation cost arises.