The present invention relates to a method and apparatus for efficient implementation of data permutation and division processing in the field of cryptography and a recording medium with a data permutation/division program recorded thereon.
Data encryption is intended to conceal data. Data encryption techniques fall into a common key cryptosystem and a public key cryptosystem.
The public key cryptosystem uses different keys for data encryption and for decryption; usually, the encryption key is made public and the decryption key is held by a user in secrecy. It is believed that the description key could not be derived from the encryption key within a practical amount of time even with modern mathematical theories and the computing power of the present-day computer.
On the other hand, the common key cryptosystem uses the same key for data encryption and decryption. To implement a fast and secure common key cipher, there is proposed a block encipherment scheme that divides data to be enciphered into blocks of an appropriate length and enciphers them one by one. Many of the block ciphers have a structure called a Feistel network. With this structure, an input of 2n bits is divided to right and left pieces of n-bit data, a function f is operated on the right n-bit data, then its output is exclusive ORed with the left n-bit data, then the right and left pieces of data are swapped, and the same operation is repeated. This structure is shown in, xe2x80x9cBruce Schneier, Applied Cryptography, 2nd edition, John-Wiley and Sons, p347, 1996.xe2x80x9d
The common key cryptosystem is smaller in computational complexity than the public key cryptosystem, and the amount of data that can be encrypted per unit time in the former cryptosystem is tens to hundreds of times larger than in the latter cryptosystem. For this reason, there are tendencies to use the common key cryptosystem when fast encryption processing is necessary.
The common key cryptosystem is required to have security against cryptanalysis as well as the above-mentioned high-speed performance. In recent years there have been proposed several methods of cryptanalysis for common key encryption algorithms. It is necessary, therefore, that a common key encryption algorithm to be newly developed be always secure against such cryptanalysis methods. These cryptanalysis methods are described, for example, in xe2x80x9cBruce Schneier, Applied Cryptography, 2nd edition, John-Wiley and Sons, pp.285-293, 1996.xe2x80x9d
There have also been studied schemes that would not allow easy application of the cryptanalysis methods, and it can be expected that such preventive schemes will increase the security of the common key encryption algorithm. According to one of such preventive schemes, a value of some kind available from an encryption key is exclusive ORed with input and output data so as to protect the input and output data for the basic encryption, algorithm from an attacker. This scheme is described in xe2x80x9cBruce Schneier, Applied Cryptography, 2nd edition, John-Wiley and Sons, pp.366-367, 1996.xe2x80x9d Many of common key encryption algorithms proposed in recent years are designed using this scheme.
With the above scheme, the input data exclusive ORed with the value of some kind available from the encryption key is used as input data of the basic encryption algorithm. In the case of using the afore-mentioned Feistel network, the input data needs to be divided to right and left data. Some of recently developed common key encryption algorithms are intended to provide increased security not only by dividing the input data to right and left but also by dividing the input data to right and left even after permutation. An example of such algorithms is an E2 cipher (Masayuki KANDA, et al., xe2x80x9cA New 128-bit Block Cipher E2,xe2x80x9d Technical Report of IEICE, ISEC98-12 (hereinafter referred to simply as literature E2). In the E2 algorithm, a permutation processing called a BP function is defined and then the input data is divided to right and left for input into the Feistel network.
FIG. 1 depicts a basic configuration of an E2 cryptographic device, in which no key scheduling part is shown for brevity. The E2 cryptographic device is made up of an initial transformation part 10, twelve round processing stages RND1 to RND12, and a final transformation part 30. The size of each key is, for instance, 128-bit. The initial transformation part 10 comprises: an XOR operation part 11 that exclusive ORs an input plaintext M of, for example, 128 bits with a subkey k13; a multiplication part 12 that calculates the product of the output from the XOR operation part 11 and a subkey k14; and a byte permutation part (hereinafter referred to as a BP function part) 13 that performs byte permutation of the multiplied output from the multiplication part 12. To increase the operation efficiency, setting the computation size of a CPU of the computer used, for example, at 32 bits, the operation is carried out for each of four 32-bit subblocks divided from the 128-bit data.
The initial transformation part (hereinafter referred to as an IT function part) 10 performs the following operation for an input X=M using the subkeys k13 and k14.
A=IT(X, k13, k14)xe2x80x83xe2x80x83(1)
More specifically, letting
X=(x1, x2, x3, x4)
Y=(y1, y2, y3, y4)
Z=(z1, z2, z4, z4)
the following operation is performed by the XOR operation part 11 and the multiplication part 12.
Z=(X⊕k13){circle around (xc3x97)}k14=Y{circle around (xc3x97)}k14xe2x80x83xe2x80x83(2)
In the above, if k14=(K1, K2, K3, K4), the multiplication Y{circle around (xc3x97)}k14 by the multiplication part 12 is performed as follows:
zi=yi(Ki1(hex))mod 232 for i=1,2,3,4xe2x80x83xe2x80x83(3)
The operation symbol ab represents the OR of a and b for every corresponding bit. Setting
(zi(1), zi(2), zi(3), zi(4))=zi for i=1,2,3,4xe2x80x83xe2x80x83(4)
Zxe2x80x2=(z1xe2x80x2, z2xe2x80x2, z3xe2x80x2, z4xe2x80x2)
The operation processing of the BP function part 13 is expressed by the following equation:
zixe2x80x2=(zxe2x80x2i(1), zxe2x80x2i+1(2), zxe2x80x2i+2(3), zxe2x80x2i+3(4)), i=1,2,3,4,xe2x80x83xe2x80x83(5)
where
zxe2x80x2i+4(j)=zxe2x80x2i(j), j=1,2,3,4xe2x80x83xe2x80x83(6)
where i represents the subblock number for each 32 bits and j the data number of each byte in the subblock. In FIG. 3 there are shown permutations expressed by Eqs. (5) and (6). The four bytes of each piece of data z1, z2, z3 and z4 are distributed to four different output data groups.
The output from the byte permutation part (that is, the BP function part) 13 is divided to right data R0 and left data L0, which are provided to the round processing stage RND1. The i-th round processing stage RNDi performs substitution-permutation processing of right data Rixe2x88x921 in a round function part 22 by using a subkey ki, and provides the substitution result to an XOR operation part 21, wherein it is exclusive ORed with left data Lixe2x88x921 fed thereto. The right data Rixe2x88x921 input to the i-th stage and the output from the XOR operation part 21 are exchanged in position, and they are provided as left data Li and right data Ri to the next round processing stage RNDi+1.
This is expressed as follows:
Ri=Lixe2x88x921⊕F(Rixe2x88x921, ki)xe2x80x83xe2x80x83(7)
Li=Rixe2x88x921, i=1, 2, . . . , 12xe2x80x83xe2x80x83(8)
Each round function part 22 comprises, as depicted in FIG. 2, eight XOR operation parts 22X1, eight S-boxes (S function) 22S1, a linear permutation part (a P function part) 22P, eight XOR operation parts 22X2, and eight S-boxes 22S2. 64-bit right data R is input to the i-th round processing stage RNDi. In the round function part 22, setting the input Rixe2x88x921
Rixe2x88x921=(r1, r2, r3, r4, r5, r6, r7, r8)
ki=(K(1), K(2))=(K1(1), K2(1), . . . , K8(1), K1(2), K2(2), . . . , K8(2))
the outputs from the S-boxes 22S1 is given by the following equation:
(u1, u2, . . . , u8)=(s(r1⊕K1(1)),s(r2⊕K2(1)), . . . , s(r8⊕K8(1)))xe2x80x83xe2x80x83(9)
The output from the linear permutation part 22P can be expressed as follows:
uxe2x80x21=u2⊕u3⊕u4⊕u5⊕u6⊕u7
uxe2x80x22=u1⊕u3⊕u4⊕u6⊕u7⊕u8
uxe2x80x23=u1⊕u2⊕u4⊕u5⊕u7⊕u8
uxe2x80x24=u1⊕u2⊕u3⊕u5⊕u6⊕u8
uxe2x80x25=u1⊕u2⊕u4⊕u5⊕u6
uxe2x80x26=u1⊕u2⊕u3⊕u6⊕u7
uxe2x80x27=u2⊕u3⊕u4⊕u7⊕u8
uxe2x80x28=u1⊕u3⊕u4⊕u5⊕u8xe2x80x83xe2x80x83(10)
The outputs from the S-boxes 22S2 are expressed by the following equation:
(v1,v2,v3,v4,v5,v6,v7,v8)=(s(uxe2x80x21⊕K1(2)), s(uxe2x80x22⊕K2(2)), . . . , s(uxe2x80x28⊕K8(2)))xe2x80x83xe2x80x83(11)
These outputs are subjected to byte rotation and then output from the round function part 22.
In the case of FIG. 1, twelve such round processing stages are cascade-connected, and left and right data L12 and R12 output from the 12-th round processing part RND12 are concatenated into 128-bit data, which is fed to a BPxe2x88x921 function part 31 of the final transformation part 30.
The final transformation part 30 obtains, as a ciphertext X=C, X=FT(Zxe2x80x2, k15, k16) from the input thereto Zxe2x80x2=(z1xe2x80x2, z2xe2x80x2, z3xe2x80x2, z4xe2x80x2) and keys k15, k16. More specifically, the BPxe2x88x921 function part 31 performs inverse processing of the BP function part 13 by the following equation to obtain the output Z.
(zxe2x80x2i(1), zxe2x80x2i(2), zxe2x80x2i(3), zxe2x80x2i(4))=zxe2x80x2i, i=1, 2, 3, 4
zi=(zxe2x80x2i(1), zxe2x80x2ixe2x88x921(2), zxe2x80x2ixe2x88x922(3), zxe2x80x2ixe2x88x923(4))i=1,2,3,4xe2x80x83xe2x80x83(12)
xe2x80x83where zxe2x80x2ixe2x88x924(j)=zxe2x80x2i(j)j=1,2,3,4xe2x80x83xe2x80x83(13)
Z=(z1, z2, z3, z4)
The output Z is provided to a division part 32, which performs the division of the following equation using a subkey k15=(K1,K2,K3,K4).
yi=zi(Ki1(hex))xe2x88x921 mod 232, i=1,2,3,4xe2x80x83xe2x80x83(14)
A variable in Eq. (14) is zi alone. Hence, it is possible to provide increased efficiency of calculation to precalculate and prestore the value of an inverse element Gi=(Ki1(hex))xe2x88x921 mod 232 in a memory, since the stored value can be used to calculate yi=ziGi mod 232 for each input data zi. The calculation result Y=(y1,y2,y3,y4) is exclusive ORed with a subkey k16in an XOR operation part 33 by the following equation, and the resulting output X is provided as the ciphertext C.
C=X=Y⊕k16xe2x80x83xe2x80x83(15)
FIG. 3 depicts the input/output relationship by the byte permutation using the BP functions expressed by Eqs. (5) and (6). As shown, the four pieces of 4-byte data z1, z2, z3 and z4 are rearranged on a bytewise basis to obtain the four pieces of 4-byte data z1xe2x80x2, z2xe2x80x2, z3xe2x80x2 and z4xe2x80x2. Conventionally, this byte permutation is implemented by performing the operation expressed by the following equation:
z1xe2x80x2=(z1ff000000)(z200ff0000)(z30000ff00)(z4000000ff)
z2xe2x80x2=(z2ff000000)(z300ff0000)(z40000ff00)(z1000000ff)
z3xe2x80x2=(z3ff000000)(z400ff0000)(z10000ff00)(z2000000ff)
z4xe2x80x2=(z4ff000000)(z100ff0000)(z20000ff00)(z3000000ff)xe2x80x83xe2x80x83(16)
where the symbol  represents the AND for each bit and the symbol  the OR for each bit and xe2x80x9cfxe2x80x9d and xe2x80x9c0xe2x80x9d are hexadecimal values. This operation is performed as depicted in FIG. 4. For the sake of brevity, the entire data Z=zi(j)(where i=1,2,3,4; j=1,2,3,4) is represented by a sequence of data a0, a1, . . . , a15. For example, 4-byte data z1 of a register RG1 and 4-byte mask data MD1 of a mask register MRG1 are ANDed to obtain z1ff000000, which is stored in a register RG1xe2x80x2. Then, the AND of data z2 and mask data MD2, z200ff0000, is calculated and is ORed with the data read out of the register RG1xe2x80x2, and the OR thus obtained is overwritten on the register RG1xe2x80x2. By performing the same processing for mask data MD3 and MD4 as well, the data z1xe2x80x2 is provided in the register RG1xe2x80x2. The same calculation processing as described above is also carried out for the data z2xe2x80x2, z3xe2x80x2 and z4xe2x80x2 by Eq. (16). Thus the byte permutation results are obtained in registers RG1xe2x80x2 to RG4xe2x80x2. In the implementation of this calculation scheme, there have been pointed out such problems as mentioned below. That is, the processing by the BP function is byte-byte permutation processing, but a one-word register built in recent CPUs involves masking and shift operations, and hence it consumes much processing time. And, even if the permutation can be made after the ORs are once copied to a memory, the time for memory access inevitably increases, resulting in the processing time increasing. These problems constitute an obstacle to the realization of high-speed performance of the common key cryptosystem.
In the division part 32 in FIG. 1 a precalculated inverse element can be used. In general, it is possible to utilize, for the execution of an inverse element calculation to modulus N, an extended Euclidian algorithm set forth, for instance, in Okamoto and Ohta, coeditors, xe2x80x9cCipher/Zero Knowledge Proof/Number Theory,xe2x80x9d Kyouritsu Shuppan, 1995, pp.120-121. In the case of Eq. (14), however, since the modulus has a special form of 2m, the inverse element can efficiently be calculated by the use of a Hensel Lifting method (a natural method of raising the root of a polynomial from mod bm to mod bm+1). In the calculation of the inverse element with software, when m is about one word length, a Zassenhaus""s proposed method which is a quadratic version of the Hensel Lifting (H. Zassenhaus, xe2x80x9cOn Hensel Factorization, I,xe2x80x9d Journal of number theory, vol.1, pp.291-311, 1969) is effective because the word multiplication is relatively fast on recent CPUs.
Letting the input be represented by x, the output by y and auxiliary or temporary variables by a and b and letting [x] represent a Gauss symbol (the maximum integer which does not exceed x), the Zassenhaus method provides an algorithm for calculating an inverse y=xxe2x88x921 mod 2m as given below assuming that the bit position is given 0 as the least significant bit and that the bit lengths of x, y, a and b are m, where m=2n(nxe2x89xa71):
Step 1: Input x.
Step 2: Initialize y:=1 and b:=[x/2]
Step 3: Do the following for i=0, 1, . . . , nxe2x88x921
1. Set a as low-order 2i bits of yxc3x97(22ixe2x88x92(low-order 2i bits of b)).
2. Pad 2i-th to (2i+1xe2x88x921)-th bits of y with low-order 2i bits of a.
3. Store 2i-th to (2nxe2x88x922ixe2x88x921)-th bits of xa+b in b.
Step 4: Output y.
This bit processing is such as shown in FIG. 5, in which the contents of the register having stored therein the output y for state changes of i are represented by binary numbers.
xe2x80x9c1xe2x80x9d indicates bits whose value is always 1, xe2x80x9c.xe2x80x9d calculated bits, and xe2x80x9c?xe2x80x9d unknown bits. In the result with i=2 calculated using i=1, fourth to seventh bits are determined. To arrange data in the fourth to seventh bits will hereinafter be referred to as xe2x80x9cpadding.xe2x80x9d
The configuration of the RSA cipher, which is a typical public key cryptosystem, is described, for example, in the above-mentioned literature xe2x80x9cCipher/Zero Knowledge Proof/Number Theory,xe2x80x9d p.220. The RSA cipher requires a power calculation over Z/NZ, that is, what is called a modular exponentiation. For fast execution of this modular exponentiation, it is effective to use the Montgomery modular arithmetic algorithm introduced in the above-mentioned literature on pages 179-181. The execution of the Montgomery modular arithmetic algorithm involves an inverse calculation in mod 2m using m as a natural number.
The above-mentioned Zassenhaus scheme involves bitwise processing such as extraction and padding of low-order 2i bits and, in the case of software implementation, the masking operation increases, and hence efficiency is not so high.
The round function part 22 of the round processing stage depicted in FIG. 2 is formed by a combination of the substitution by the S function part 22S1 and the permutation by the P function part 22P.
The substitution-permutation is a concept of a considerably broad meaning. To meet a demand for software implementation in recent years, there has widely been used the substitution-permutation in the following form:                               [                                                                      y                  1                                                                                                      y                  2                                                                                    ⋮                                                                                      y                  m                                                              ]                =                  P          ⁢                      xe2x80x83                    [                                                                                          s                    1                                    ⁡                                      (                                          x                      1                                        )                                                                                                                                            s                    2                                    ⁡                                      (                                          x                      2                                        )                                                                                                      ⋮                                                                                                          s                    n                                    ⁡                                      (                                          x                      n                                        )                                                                                ]                                    (        17        )            
In this instance, operations are all performed over the ring R. The permutation is given by                     P        =                  xe2x80x83                ⁢                  [                                                                      p                  11                                                                              p                  12                                                            ⋯                                                              p                                      1                    ⁢                    n                                                                                                                        p                  21                                                                              p                  22                                                            ⋯                                                              p                                      2                    ⁢                    n                                                                                                      ⋮                                            ⋮                                            ⋰                                            ⋮                                                                                      p                  m1                                                                              p                  m2                                                            ⋯                                                              p                  mn                                                              ]                                    (        18        )            
and the substitution is set to sj:Rxe2x86x92R(j=1, 2, . . . , n). That is, the product of the matrix is considered as permutation.
The substitution-permutation expressed by Eq. (17) is also used in the cipher SHARK that is defined in V. Rijmen, et al. xe2x80x9cThe Cipher SHARK,xe2x80x9d Fast Software Encryption-Third International Workshop, Lecture Notes in Computer Science 1039, pp. 99-111, Springer-Verlag 1996 (hereinafter referred to simply as Literature S). In Literature S there is also described a method in which the following modified equation is used                               xe2x80x83                ⁢                                            [                                                                                          p                      11                                                                                                  p                      12                                                                            ⋯                                                                              p                                              1                        ⁢                        n                                                                                                                                                        p                      21                                                                                                  p                      22                                                                            ⋯                                                                              p                                              2                        ⁢                        n                                                                                                                                  ⋮                                                        ⋮                                                        ⋰                                                        ⋮                                                                                                              p                      m1                                                                                                  p                      m2                                                                            ⋯                                                                              p                      mn                                                                                  ]                        ⁢                          xe2x80x83                        [                          xe2x80x83                        ⁢                                                                                                      s                      1                                        ⁡                                          (                                              x                        1                                            )                                                                                                                                                              s                      2                                        ⁡                                          (                                              x                        2                                            )                                                                                                                    ⋮                                                                                                                        s                      n                                        ⁡                                          (                                              x                        n                                            )                                                                                            ]                    =                      "AutoLeftMatch"                                          [                                                                                                                              p                          11                                                ⁢                                                                              s                            1                                                    ⁡                                                      (                                                          x                              1                                                        )                                                                                                                                                                                                                            p                          21                                                ⁢                                                                              s                            1                                                    ⁡                                                      (                                                          x                              1                                                        )                                                                                                                                                                          ⋮                                                                                                                                                    p                          m1                                                ⁢                                                                              s                            1                                                    ⁡                                                      (                                                          x                              1                                                        )                                                                                                                                              ]                            +                              [                                                                                                                              p                          12                                                ⁢                                                                              s                            2                                                    ⁡                                                      (                                                          x                              2                                                        )                                                                                                                                                                                                                            p                          22                                                ⁢                                                                              s                            2                                                    ⁡                                                      (                                                          x                              2                                                        )                                                                                                                                                                          ⋮                                                                                                                                                    p                          m2                                                ⁢                                                                              s                            2                                                    ⁡                                                      (                                                          x                              2                                                        )                                                                                                                                              ]                            +              ⋯              +                              [                                                                                                                              p                                                      1                            ⁢                            n                                                                          ⁢                                                                              s                            n                                                    ⁡                                                      (                                                          x                              n                                                        )                                                                                                                                                                                                                            p                                                      2                            ⁢                            n                                                                          ⁢                                                                              s                            n                                                    ⁡                                                      (                                                          x                              n                                                        )                                                                                                                                                                          ⋮                                                                                                                                                    p                          mn                                                ⁢                                                                              s                            n                                                    ⁡                                                      (                                                          x                              n                                                        )                                                                                                                                              ]                                                                        (        19        )            
and the output value of the function SPi expressed by the following equation (20) is precalculated corresponding to every xj and prestored, for example, in a memory to thereby efficiently calculate Eq. (17).                               S          ⁢                      xe2x80x83                    ⁢                      P            j                    ⁢                      :                    ⁢                      xe2x80x83                    ⁢                      R            ⟶                          R              m                                      ;                  xe2x80x83                ⁢                              S            ⁢                          xe2x80x83                        ⁢                                          P                j                            ⁡                              (                                  x                  j                                )                                              =                                    [                                                                                                                  p                                                  1                          ⁢                          j                                                                    ⁢                                                                        s                          j                                                ⁡                                                  (                                                      x                            j                                                    )                                                                                                                                                                                                        p                                                  2                          ⁢                          j                                                                    ⁢                                                                        s                          j                                                ⁡                                                  (                                                      x                            j                                                    )                                                                                                                                                          ⋮                                                                                                                                      p                        mj                                            ⁢                                                                        s                          j                                                ⁡                                                  (                                                      x                            j                                                    )                                                                                                                                ]                        ⁢                          xe2x80x83                        ⁢                          (                                                j                  =                  1                                ,                2                ,                …                ⁢                                  xe2x80x83                                ,                n                            )                                                          (        20        )            
In the cipher utilizing the substitution-permutation scheme, there is a case where no permutation is performed at the end of processing but only substitution is used. That is, the following processing is also necessary for cipher implementation.                               [                                                                      y                  1                                                                                                      y                  2                                                                                    ⋮                                                                                      y                  n                                                              ]                =                  [                                                                                          s                    1                                    ⁡                                      (                                          x                      1                                        )                                                                                                                                            s                    2                                    ⁡                                      (                                          x                      2                                        )                                                                                                      ⋮                                                                                                          s                    n                                    ⁡                                      (                                          x                      n                                        )                                                                                ]                                    (        21        )            
When the size of the element in R is smaller than the word length that is the operation unit in the computer used, it is necessary in straightforward implementation that the calculation of individual values of sj(Xj) be followed by shifting them to their correct vector positions. In this instance, the necessity for the data position adjustment process can be avoided by Modifying Eq. (21) to                               [                                                                      y                  1                                                                                                      y                  2                                                                                    ⋮                                                                    ⋮                                                                                      y                  n                                                              ]                =                              [                                                                                                      s                      1                                        ⁡                                          (                                              x                        1                                            )                                                                                                                    0                                                                              ⋮                                                                              ⋮                                                                              0                                                      ]                    +                      [                                                            0                                                                                                                        s                      2                                        ⁡                                          (                                              x                        2                                            )                                                                                                                    0                                                                              ⋮                                                                              0                                                      ]                    +          ⋯          +                      [                                                            0                                                                              ⋮                                                                              ⋮                                                                              0                                                                                                                        s                      n                                        ⁡                                          (                                              x                        n                                            )                                                                                            ]                                              (        22        )            
as is the case with Eq. (17) and by precalculating a table in which the positions of vector elements have been adjusted so that 0s would be provided except at the j-th position.
The calculation for the substitution and permutation described in Literature S is disadvantageous in that it involves a large number of memory references and requires a large memory capacity.
As described previously, letting the input data be (u1, u2, . . . , u8) and the output data b (u1xe2x80x2, u2xe2x80x2, . . . , u8xe2x80x2), the P function part 22P in the cipher E2 shown in FIG. 2, for instance, performs an operation using the product expressed by the following equation.                                           [                                                                                u                    1                    xe2x80x2                                                                                                                    u                    2                    xe2x80x2                                                                                                                    u                    3                    xe2x80x2                                                                                                                    u                    4                    xe2x80x2                                                                                                                    u                    5                    xe2x80x2                                                                                                                    u                    6                    xe2x80x2                                                                                                                    u                    7                    xe2x80x2                                                                                                                    u                    8                    xe2x80x2                                                                        ]                    =                      P            ⁢                          xe2x80x83                        [                                                                                u                    1                                                                                                                    u                    2                                                                                                                    u                    3                                                                                                                    u                    4                                                                                                                    u                    5                                                                                                                    u                    6                                                                                                                    u                    7                                                                                                                    u                    8                                                                        ]                          ,                  xe2x80x83                ⁢                  P          =                      [                                                            0                                                  1                                                  1                                                  1                                                  1                                                  1                                                  1                                                  0                                                                              1                                                  0                                                  1                                                  1                                                  0                                                  1                                                  1                                                  1                                                                              1                                                  1                                                  0                                                  1                                                  1                                                  0                                                  1                                                  1                                                                              1                                                  1                                                  1                                                  0                                                  1                                                  1                                                  0                                                  1                                                                              1                                                  1                                                  0                                                  1                                                  1                                                  1                                                  0                                                  0                                                                              1                                                  1                                                  1                                                  0                                                  0                                                  1                                                  1                                                  0                                                                              0                                                  1                                                  1                                                  1                                                  0                                                  0                                                  1                                                  1                                                                              1                                                  0                                                  1                                                  1                                                  1                                                  0                                                  0                                                  1                                                      ]                                              (        23        )            
where: ui, ujxe2x80x2xcex5R
This operation is expressed merely as the product of a matrix and is the same as Eq. (10). In an environment where a permutation operation using masked data or a permutation operation using a bit shift or cyclic shift is possible by processing every 32 bits, the required number of processing steps is small, and hence fast processing can be achieved. However, since such operation processing cannot be performed in a hardware environment formed by an 8-bit accumulator type CPU, a small number of registers and a small amount of memory capacity as in the case of a low-end smart card, the operation of Eq. (10) needs to be executed successively, and hence high-speed processing is difficult to perform.
The following description will be given on the assumption that in an implementation environment formed by one 8-bit accumulator, a small number of registers and a small amount of memory, the input data (u1, u2, . . . , u8) is stored in the memory and is read out therefrom, and the permutation output data u1xe2x80x2, u2xe2x80x2, . . . , u8xe2x80x2 is calculated and stored in the memory; the computational complexity in this instance is evaluated. The evaluation is made in terms of the number of times an addition/subtraction is performed, the number of times the memory is written and the number of times the memory is read. And let it be assumed that the permutation operation is performed by calculating Eq. (23) using the above-mentioned matrix P.
Conventional scheme 1: If the permutation operation of Eq. (23) is carried out as defined, then Eq. (10) needs to be calculated successively. The computational complexity in this case is as follows:
Number of additions/subtractions: 36
Number of memory reads: 44
Number of memory writes: 8
With this scheme, the number of memory reads is equal to the total number of elements of the matrix P which have a xe2x80x9c1xe2x80x9d component. Accordingly, the computational complexity increases with an increase in the number of elements whose components are xe2x80x9c1sxe2x80x9d.
Conventional scheme 2: In Literature E2 there is described, as the permutation scheme using the matrix P, a scheme that uses GF(28) as the ring R and calculates the following equations.
The computational complexity of this scheme is as follows:
Number of additions/subtractions: 16
Number of memory reads: 32
Number of memory writes: 16
This scheme is more effective than scheme 1 in the case where the number of registers used is large and addition instructions are orthogonal (the addition instructions can be executed for all the registers). However, this scheme has such disadvantages such listed below.
(a) Since the scheme essential utilizes that the characteristic of R is 2, it cannot be used when the characteristic is not 2.
(b) Since a large number of registers cannot be used, this scheme is not always efficient in some implementation environments.
(c) The computational complexity depends largely on the component configuration of the matrix P.
Conventional scheme 3: When the number of xe2x80x9c1xe2x80x9d components is large as in the case of the matrix P, the following calculation scheme can be used.
"sgr"=u1+u2+u3+u4+u5+u6+u7+u8
u1xe2x80x2="sgr"xe2x88x92u1xe2x88x92u8
u2xe2x80x2="sgr"xe2x88x92u2xe2x88x92u5
u3xe2x80x2="sgr"xe2x88x92u3xe2x88x92u6
u4xe2x80x2="sgr"xe2x88x92u4xe2x88x92u7
u5xe2x80x2="sgr"xe2x88x92u3xe2x88x92u7xe2x88x92u8
u6xe2x80x2="sgr"xe2x88x92u4xe2x88x92u5xe2x88x92u8
u7xe2x80x2="sgr"xe2x88x92u1xe2x88x92u5xe2x88x92u6
u8xe2x80x2="sgr"xe2x88x92u2xe2x88x92u6xe2x88x92u7
The computational complexity of this scheme is as follows:
Number of additions/subtractions: 27
Number of memory reads: 36
Number of memory writes: 9
This scheme is more efficient than scheme 1 when the number of elements of the matrix P having the xe2x80x9c1xe2x80x9d component accounts for more than 60% of the total number of elements.
In any of the above conventional schemes, the efficiency of the permutation by the matrix P depends on how {0, 1} components of the matrix P are distributed; in particular, the computational complexity is determined by the rate at which the elements of the xe2x80x9c1xe2x80x9d component are present. That is, there is a possibility that scheme 1 or 3 becomes efficient according to the configuration of the matrix P; hence, these schemes are not versatile. Which of these schemes becomes more efficient depends on whether the number of elements of the matrix P having the xe2x80x9c1xe2x80x9d component is more than 60% of the total number of elements.
To implement secure permutation that can be used in cryptography, it is desirable that the rates of xe2x80x9c0xe2x80x9d and xe2x80x9c1xe2x80x9d components in the matrix P be well balanced. For example, in the case of the above matrix P used for the permutation in the cipher E2, the elements of the xe2x80x9c1 xe2x80x9d component is about ⅔ of the total number of matrix elements. Since this value is close to the point of determining which scheme becomes more efficient, schemes 1 to 3 are almost common in the number of times the memory is read. This means that these conventional schemes produce substantially no effects of increasing the processing speed, because reading from or writing to the memory is several times lower in speed than addition and subtraction; therefore, the implementation of any of the schemes will not answer the intended purpose of faster permutation operation.
It is an object of the present invention to provide a data permutation method and apparatus which permit fast processing for the permutation of data with the BP function in the cipher E2 and the division of data to left and right pieces in the Feistel network through the use of one-word registers.
Another object of the present invention is to provide an inverse calculating method and apparatus which permit reduction of the number of masking operations involved.
Another object of the present invention is to provide a substitution-permutation method and apparatus which permit reduction of the number of memory reference and the required storage capacity.
Still another object of the present invention is to provide an operation method which permits reduction of computational complexity in the permutation using the matrix P.
According to a first aspect of the present invention, there is provided a data permutation method by which, letting one byte be k-bit, k being an integer equal to or greater than 1, input data of 16 bytes set, in units of four bytes, in 4k-bit long first to fourth registers is permutated, the method comprising the steps of:
(a) ANDing 4-byte first mask data with the data of said first register and 4-byte second mask data with the data of said third register, and ORing the two ANDs as first output data;
(b) ANDing 4-byte third mask data with the data of said second register and 4-byte fourth mask data with the data of said fourth register, and ORing the two ANDs as second output data;
(c) ANDing said second mask data with the data of said first register and said first mask data with the data of said third register, and ORing the two ANDs as third output data;
(d) ANDing said fourth mask data with the data of said second register and said third mask data with the data of said fourth register, and ORing the two ANDs as fourth output data; and
(e) outputting said first to fourth data as permutated data;
wherein all bits of predetermined two of four bytes of said first mask data are xe2x80x9c1s,xe2x80x9d all bits of the remaining two bytes are xe2x80x9c0s,xe2x80x9d said second mask data is complementary to said first mask data, said third mask data is a 1-byte-rotated version of said first mask data, and said fourth mask data is complementary to said third mask data.
Alternatively, according to a first aspect of the present invention, there is provided a data permutation method by which, letting one byte be k-bit, k being an integer equal to or greater than 1, input data of 16 bytes set, in units of four bytes, in 4k-bit long first to fourth registers is permutated, the method comprising the steps of:
(a) rotating the data of said first and third registers one byte in one direction, and concatenating them to form first concatenated data;
(b) concatenating the data of said second and fourth registers to form second concatenated data;
(c) shifting said first and second concatenated data two bytes in one direction, and extracting, as first and second output data, two pieces of 4-byte data from said shifted first and second concatenated data at one end thereof in the shift direction;
(d) concatenating said rotated data of said third and first registers to form third concatenated data;
(e) concatenating the data of said fourth and second registers to form fourth concatenated data;
(f) shifting said third and fourth concatenated data two bytes in one direction, and extracting, as third and fourth output data, two pieces of 4-byte data from said shifted third and fourth concatenated data at one end thereof in the shift direction; and
(g) outputting said first to fourth output data as permutated data.
According to a second aspect of the present invention, there is provided an inverse calculating apparatus comprising:
input means for storing an input x in storage means;
storage means for storing integers n and i (where n is equal to or greater than 1) and 2n-bit integers x, y, a and b;
first b-initialization means for calculating [x/2] by using said x stored in said storage means (where [x] is the maximum integer not exceeding said x), and for storing the calculation result as b in said storage means;
a-initialization means for storing, as said a, in said storage means the least significant bit of said b stored in said storage means;
second b-initialization means for calculating [(ax+b)/2] by using said a, x and b stored in said storage means and for updating said b stored in said storage means with the calculation result;
y-updating means for calculating y+axc3x972{circumflex over ( )}(2i) using said a, y and i stored in said storage means and updating said y stored in said storage means with the calculation result (where p{circumflex over ( )}q represents the q-th power of p);
i-updating means for updating said i stored in said storage means to i+1;
a-updating means for calculating xe2x88x92by by using said b and y stored in, said storage means and for updating said a stored in said storage means with the calculation result;
b-updating means for calculating [(b+ax)/(2{circumflex over ( )}(2i))] by using a, b, x and i stored in said storage means and for updating said b stored in said storage means;
y-updating means for calculating y+axc3x972{circumflex over ( )}(2i) by using said a, y and i stored in said storage means and for updating said y stored in said storage means with the calculation result;
i-updating means for updating said i stored in said storage means to i+1;
control means for reading out said i and n stored in said storage means and actuating said a-updating means, said b-updating means, said y-updating means and said i-updating means one after another until i=n; and
output means for outputting said y stored in said storage means.
According to a third aspect of the present invention, there is provided a substitution-permutation apparatus which, by the following substitution-permutation over a ring R             [                                                  y              1                                                                          y              2                                                            ⋮                                                              y              m                                          ]        =                            P          ⁢                      xe2x80x83                    [                                                                                          s                    1                                    ⁡                                      (                                          x                      1                                        )                                                                                                                                            s                    2                                    ⁡                                      (                                          x                      2                                        )                                                                                                      ⋮                                                                                                          s                    n                                    ⁡                                      (                                          x                      n                                        )                                                                                ]                ⁢                  xe2x80x83                ⁢        where        ⁢                  xe2x80x83                ⁢        P            =              [                                                            p                11                                                                    p                12                                                    ⋯                                                      p                                  1                  ⁢                  n                                                                                                        p                21                                                                    p                22                                                                    xe2x80x83                                                                    p                                  2                  ⁢                  n                                                                                        ⋮                                                      xe2x80x83                                                                    xe2x80x83                                                    ⋮                                                                          p                m1                                                                    p                m2                                                    ⋯                                                      p                mn                                                    ]                                p        ij            ∈      R        ,          xe2x80x83        ⁢                            s          j                ⁢                  :                ⁢                  xe2x80x83                ⁢                  R          ⟶          R                ⁢                  xe2x80x83                ⁢        i            =      1        ,    2    ,    ⋯    ⁢          xe2x80x83        ,                  m        ⁢                  xe2x80x83                ⁢        j            =      1        ,    2    ,    ⋯    ⁢          xe2x80x83        ,    n  
performs a substitution-permutation operation of an input data sequence (xj) to calculate a data sequence (yi), said apparatus comprising:
storage means for storing: precalculated values of vi (whose dimensions are up to m and may differ individually) over said ring R necessary for said substitution-permutation, obtained by swapping rows or columns of said matrix P for some pij or sj of the same values; a precalculated value of a function Sk:Rxe2x86x92Rm; precalculated values of n vectors wkxcex5Rm; and an integer k;
input means for storing said input data sequence xj in said storage means;
k-initialization means for setting said integer k to 0;
k-updating means for updating said k in said storage means to k+1;
Sk calculating means for reading out each Sk and input data (xk) from said storage means to obtain the result of calculation of a vector Sk (xk) and for storing said vector as a vector wk in said storage means;
uk-generating means for reading out of said storage means a set of vectors {v1} necessary for forming a k-th column of said matrix P and for generating a vector uk;
uk*Sk calculating means for reading out said wk from said storage means and calculating the product for each element and for updating said wkwith the calculation result;
control means for reading out said k stored in said storage means and for actuating said Sk calculating means, said uk*Sk calculating means and said k-updating means one after another until k=n; and
output means for reading out each wk stored in said storage means and for calculating and outputting their sum.
Alternatively, according to a third aspect of the present invention, a substitution-permutation apparatus which, by the following substitution-permutation over a ring R             [                                                  y              1                                                                          y              2                                                            ⋮                                                              y              m                                          ]        =                            P          ⁢                      xe2x80x83                    [                                                                                          s                    1                                    ⁡                                      (                                          x                      1                                        )                                                                                                                                            s                    2                                    ⁡                                      (                                          x                      2                                        )                                                                                                      ⋮                                                                                                          s                    n                                    ⁡                                      (                                          x                      n                                        )                                                                                ]                ⁢                  xe2x80x83                ⁢        where        ⁢                  xe2x80x83                ⁢        P            =              [                                                            p                11                                                                    p                12                                                    ⋯                                                      p                                  1                  ⁢                  n                                                                                                        p                21                                                                    p                22                                                                    xe2x80x83                                                                    p                                  2                  ⁢                  n                                                                                        ⋮                                                      xe2x80x83                                                                    xe2x80x83                                                    ⋮                                                                          p                m1                                                                    p                m2                                                    ⋯                                                      p                mn                                                    ]                                p        ij            ∈      R        ,          xe2x80x83        ⁢                            s          j                ⁢                  :                ⁢                  xe2x80x83                ⁢                  R          ⟶          R                ⁢                  xe2x80x83                ⁢        i            =      1        ,    2    ,    ⋯    ⁢          xe2x80x83        ,                  m        ⁢                  xe2x80x83                ⁢        j            =      1        ,    2    ,    ⋯    ⁢          xe2x80x83        ,    n  
performs a substitution-permutation operation of an input data sequence (xj) to produce a data sequence (yi), comprisies:
storage means for storing a precalculated value of the following equation with rows of a matrix P rearranged             S      ⁢              xe2x80x83            ⁢                        P                      1            ⁢            j                          ⁡                  (                      x            j                    )                      =          [                                                                  p                                                      t                    ⁡                                          (                      q1                      )                                                        ⁢                  j                                            ⁢                                                s                  j                                ⁡                                  (                                      x                    j                                    )                                                                                                                        p                                                      t                    ⁡                                          (                                              q1                        +                        1                                            )                                                        ⁢                  j                                            ⁢                                                s                  j                                ⁡                                  (                                      x                    j                                    )                                                                                          ⋮                                                                              p                                                      t                    ⁡                                          (                      r1                      )                                                        ⁢                  j                                            ⁢                                                s                  j                                ⁡                                  (                                      x                    j                                    )                                                                        ]        ⁢      xe2x80x83  
xe2x80x83(where b(j) is a natural number equal to or greater than 1 but equal to or smaller than m, l=1, 2, . . . , b(j), t:{1, 2, . . . , m}xe2x86x92{1, 2, . . . , m} is permutation, and ql and rl are natural numbers equal to or greater than 1 but equal to or smaller than n, qlxe2x89xa6rl) together with precalculated values of n vectors wkxcex5Rm and an integer k;
input means for storing said input data sequence (xi) in said storage means;
k-initialization means for setting said integer k to 0;
k-updating means for updating said k stored in said storage means to. k+1;
SPk calculating means for reading out input data xk and SPlj(xk) from said storage means, for calculating said SPlj(xk) for each l (where l=1, 2, . . . , b(j)) and concatenating the calculated results in correspondence to a k-th column of said rearranged matrix P to obtain an m-dimensional vector, and: for updating said wk stored in said storage means with said m-dimensional vector as wk;
control means for reading out said k stored in said storage means and for actuating said SPk calculating means and said k-updating means one after the other until k=n; and
output means for reading out each wk stored in said storage means and for calculating and outputting their sum.
According to a fourth aspect of the present invention, there is provided a permutation method in which an operating apparatus including an accumulator type CPU and registers is used to permute input data u1, u2, . . . , un by the following equation using an m by n matrix P of predetermined {0, 1} elements to obtain permuted data (u1xe2x80x2, u2xe2x80x2, . . . , umxe2x80x2)       [                                        u            1            xe2x80x2                                                            u            2            xe2x80x2                                                ⋮                                                  u            n            xe2x80x2                                ]    =      P    ⁢          xe2x80x83        [                                        u            1                                                            u            2                                                ⋮                                                  u            n                                ]  
said method comprising the steps of:
(a) setting each piece of said permuted data ujxe2x80x2 by the following equation using already calculated uixe2x80x2
xe2x80x83ujxe2x80x2=uixe2x80x2+Di
where jxe2x89xa0i, and j are integers equal to or greater than 1 and equal to or smaller than n, n is an integer equal to or greater than 2 and Di is given by the difference Di=ujxe2x80x2-uixe2x80x2 between said permuted data ujxe2x80x2 and uixe2x80x2 defined by said matrix P using said input data u1, u2, . . . , un; and
(b) calculating said ujxe2x80x2 for all of said j.