The present invention relates to a method and apparatus for efficient implementation of data permutation and division processing in the field of cryptography and a recording medium with a data permutation/division program recorded thereon.
Data encryption is intended to conceal data. Data encryption techniques fall into a common key cryptosystem and a public key cryptosystem.
The public key cryptosystem uses different keys for data encryption and for decryption; usually, the encryption key is made public and the decryption key is held by a user in secrecy. It is believed that the description key could not be derived from the encryption key within a practical amount of time even with modern mathematical theories and the computing power of the present-day computer.
On the other hand, the common key cryptosystem uses the same key for data encryption and decryption. To implement a fast and secure common key cipher, there is proposed a block encipherment scheme that divides data to be enciphered into blocks of an appropriate length and enciphers them one by one. Many of the block ciphers have a structure called a Feistel network. With this structure, an input of 2n bits is divided to right and left pieces of n-bit data, a function f is operated on the right n-bit data, then its output is exclusive ORed with the left n-bit data, then the right and left pieces of data are swapped, and the same operation is repeated. This structure is shown in “Bruce Schneier, Applied Cryptography, 2nd edition, John-Wiley and Sons, p347, 1996.”
The common key cryptosystem is smaller in computational complexity than the public key cryptosystem, and the amount of data that can be encrypted per unit time in the former cryptosystem is tens to hundreds of times larger than in the latter cryptosystem. For this reason, there are tendencies to use the common key cryptosystem when fast encryption processing is necessary.
The common key cryptosystem is required to have security against cryptanalysis as well as the above-mentioned high-speed performance. In recent years there have been proposed several methods of cryptanalysis for common key encryption algorithms. It is necessary, therefore, that a common key encryption algorithm to be newly developed be always secure against such cryptanalysis methods. These cryptanalysis methods are described, for example, in “Bruce Schneier, Applied Cryptography, 2nd edition, John-Wiley and Sons, pp.285-293, 1996.”
There have also been studied schemes that would not allow easy application of the cryptanalysis methods, and it can be expected that such preventive schemes will increase the security of the common key encryption algorithm. According to one of such preventive schemes, a value of some kind available from an encryption key is exclusive ORed with input and output data so as to protect the input and output data for the basic encryption algorithm from an attacker. This scheme is described in “Bruce Schneier, Applied Cryptography, 2nd edition, John-Wiley and Sons, pp.366-367, 1996.” Many of common key encryption algorithms proposed in recent ears are designed using this scheme.
With the above scheme, the input data exclusive ORed with the value of some kind available from the encryption key is used as input data of the basic encryption algorithm. In the case of using the afore-mentioned Feistel network, the input data needs to be divided to right and left data. Some of recently developed common key encryption algorithms are intended to provide increased security not only by dividing the input data to right and left but also by dividing the input data to right and left even after permutation. An example of such algorithms is an E2 cipher (Masayuki KANDA, et al., “A New 128-bit Block Cipher E2,” Technical Report of IEICE, ISEC98-12 (hereinafter referred to simply as literature E2). In the E2 algorithm, a permutation processing called a BP function is defined and then the input data is divided to right and left for input into the Feistel network.
FIG. 1 depicts a basic configuration of an E2 cryptographic device, in which no key scheduling part is shown for brevity. The E2 cryptographic device is made up of an initial transformation part 10, twelve round processing stages RND1 to RND12, and a final transformation part 30. The size of each key is, for instance, 128-bit. The initial transformation part 10 comprises: an XOR operation part 11 that exclusive ORs an input plaintext M of, for example, 128 bits with a subkey k13; a multiplication part 12 that calculates the product of the output from the XOR operation part 11 and a subkey k14; and a byte permutation part (hereinafter referred to as a BP function part) 13 that performs byte permutation of the multiplied output from the multiplication part 12. To increase the operation efficiency, setting the computation size of a CPU of the computer used, for example, at 32 bits, the operation is carried out for each of four 32-bit subblocks divided from the 128-bit data.
The initial transformation part (hereinafter referred to as an IT function part) 10 performs the following operation for an input X=M using the subkeys k13 and k14.A=IT(X,k13,k14)  (1) 
More specifically, lettingX=(x1,x2,x3,x4) Y═(Y1,Y2, Y3,Y4) Z=(z1,z2,z3,z4) the following operation is performed by the XOR operation part 11 and the multiplication part 12.Z=(X⊕k13)k14=Yk4  (2) In the above, if k14=(K1, K2, K3, K4), the multiplication Yk14 by the multiplication part 12 is performed as follows:zi=yi(Ki1(hex))mod 232 for i=1, 2, 3, 4  (3) The operation symbol ab represents the OR of a and b for every corresponding bit. Setting(zi(1),zi(2), zi(3),zi(4))=zi for i=1, 2, 3, 4  (4) Z′=(z1′,z2′,z3′, z4′) The operation processing of the BP function part 13 is expressed by the following equation:zi′=(z′i(1),z′i+1(2),zi+2(3),z′i+3(4)), i=1, 2, 3, 4  (5) wherez′i+4(j)=z′i(j), j=1, 2, 3, 4  (6) where i represents the subblock number for each 32 bits and j the data number of each byte in the subblock. In FIG. 3 there are shown permutations expressed by Eqs. (5) and (6). The four bytes of each piece of data z1, z2, z3 and z4 are distributed to four different output data groups.
The output from the byte permutation part (that is, the BP function part) 13 is divided to right data R0 and left data L0, which are provided to the round processing stage RND1. The i-th round processing stage RNDi performs substitution-permutation processing of right data Ri−1 in a round function part 22 by using a subkey ki, and provides the substitution result to an XOR operation part 21, wherein it is exclusive ORed with left data Li−1, fed thereto. The right data Ri−1 input to the i-th stage and the output from the XOR operation part 21 are exchanged in position, and they are provided as left data Li and right data Ri to the next round processing stage RNDi+1, This is expressed as follows:Ri=Li−1⊕F(Ri−1,ki)  (7) L1=Ri−1, i=1, 2, . . . , 12  (8) Each round function part 22 comprises, as depicted in FIG. 2, eight XOR operation parts 22×1, eight S-boxes (S function) 22S1, a linear permutation part (a P function part) 22P, eight XOR operation parts 22×2, and eight S-boxes 22S2. 64-bit right data R is input to the i-th round processing stage RNDI. In the round function part 22, setting the input R1.Ri−1=(r1,r2, r3, r4,r5r6,r7,r8) ki=(K(1),K(2))=(K1(1), K2(1), . . . K8(1),K1(2),K2(2), . . . K8(2)) the outputs from the S-boxes 22S1 is given by the following equation:(u1u2, . . . u8)=(s(r1⊕K1(1)),s(r2⊕K2(1)), . . . , s(r8⊕K8(1))  (9) The output from the linear permutation part 22P can be expressed as follows:u′1=u2⊕u3⊕u4⊕u5⊕u6⊕u7 u′2=u1⊕u3⊕u4⊕u6⊕u7⊕u8 u′3=u1⊕u2⊕u4⊕u5⊕u7⊕u8 u′4=u1⊕u2⊕u3⊕u5⊕u6⊕u8 u′5=u1⊕u2⊕u4⊕u5u6 u′6=u1⊕u2⊕u3⊕u6u7 u′7=u2⊕u3⊕u4⊕u7u8 u′8=u1⊕u3⊕u4⊕u5u8  (10) The outputs from the S-boxes 22S2 are expressed by the following equation:(v1,v2,v3,v4,v5,v6,v7,v8)=(s(u′1⊕K1(2)),s(u′2⊕K2(2)), . . . , s(u′8⊕K8(2))  (11) These outputs are subjected to byte rotation and then output from the round function part 22.
In the case of FIG. 1, twelve such round processing stages are cascade-connected, and left and right data L12 and R12 output from the 12-th round processing part RND12 are concatenated into 128-bit data, which is fed to a BP−1 function part 31 of the final transformation part 30.
The final transformation part 30 obtains, as a ciphertext X=C, X==FT(Z′, k15, k16) from the input thereto Z′=(z1′, z2′, z3′, z4′) and keys k15, k16. More specifically, the BP−1 function part 31 performs inverse processing of the BP function part 13 by the following equation to obtain the output Z.(z′i(1),z′i(2),z′i(3),z′i(4)=z′i, i=1, 2, 3, 4 zi=(z′i(1),z′i−1(2),z′i−2(3),z′i−3(4) i=1, 2, 3, 4  (12) wherez′i−4(j)=z′i(j) j=1, 2, 3, 4  (13) Z=(z1z2,z3,z4) The output Z is provided to a division part 32, which performs the division of the following equation using a subkey k 15=(K1,K2,K3,K4).yi=zi(Ki1(hex))−1mod 232, i=1, 2, 3, 4  (14) A variable in Eq. (14) is zi alone. Hence, it is possible to provide increased efficiency of calculation to precalculate and prestore the value of an inverse element Gi=(Ki1(hex))−1 mod 232 in a memory, since the stored value can be used to calculate yi=ziGi mod 232 for each input data zi. The calculation result Y=(y1,y2,y3,y4) is exclusive ORed with a subkey k16 in an XOR operation part 33 by the following equation, and the resulting output X is provided as the ciphertext C.C=X=Y⊕k16  (15) 
FIG. 3 depicts the input/output relationship by the byte permutation using the BP functions expressed by Eqs. (5) and (6). As shown, the four pieces of 4-byte data z1, z2, z3 and z4 are rearranged on a bytewise basis to obtain the four pieces of 4-byte data z1′, z2′, z3′ and Z4′. Conventionally, this byte permutation is implemented by performing the operation expressed by the following equation:z1′=(z1ff000000)(z200ff0000)(z30000ff00)(z4000000ff) z2′=(z2ff000000)(z300ff0000)(z40000ff00)(z2000000ff) z3′=(z3ff000000)(z400ff0000)(z10000ff00)(z2000000ff) z4′=(z4ff000000)(z100ff0000)(z20000ff00)(z3000000ff)  (16) where the symbol  represents the AND for each bit and the symbol  the OR for each bit and “f” and “0” are hexadecimal values. This operation is performed as depicted in FIG. 4. For the sake of brevity, the entire data Z=z1(j) (where i=1, 2, 3, 4; j=1, 2, 3, 4) is represented by a sequence of data a0, a1, . . . a15. For example, 4-byte data z1 of a register RG1 and 4-byte mask data MD1 of a mask register MRG1 are ANDed to obtain z1ff000000, which is stored in a register RG1′. Then, the AND of data z and mask data MD2, z200ff000, is calculated and is ORed with the data read out of the register RG1′, and the OR thus obtained is overwritten on the register RG1′. By performing the same processing for mask data MD3 and MD4 as well, the data z1′ is provided in the register RG1′. The same calculation processing as described above is also carried out for the data z2′, z3′ and z4′ by Eq. (16). Thus the byte permutation results are obtained in registers RG1′ to RG4′. In the implementation of this calculation scheme, there have been pointed out such problems as mentioned below. That is, the processing by the BP function is byte—byte permutation processing, but a one-word register built in recent CPUs involves masking and shift operations, and hence it consumes much processing time. And, even if the permutation can be made after the ORs are once copied to a memory, the time for memory access inevitably increases, resulting in the processing time increasing. These problems constitute an obstacle to the realization of high-speed performance of the common key cryptosystem.
In the division part 32 in FIG. 1 a precalculated inverse element can be used. In general, it is possible to utilize, for the execution of an inverse element calculation to modulus N, an extended Euclidian algorithm set forth, for instance, in Okamoto and Ohta, coeditors, “Cipher/Zero Knowledge Proof/Number Theory,” Kyouritsu Shuppan, 1995) pp.120-121. In the case of Eq. (14), however, since the modulus has a special form of 2m, the inverse element can efficiently be calculated by the use of a Hensel Lifting method (a natural method of raising the root of a polynomial from mod bm to mod bm+1). In the calculation of the inverse element with software, when m is about one word length, a Zassenhaus's proposed method which is a quadratic version of the Hensel Lifting (H. Zassenhaus, “On Hensel Factorization, I,” Journal of number theory, vol. 1, pp.291-311, 1969) is effective because the word multiplication is relatively fast on recent CPUs.
Letting the input be represented by x, the output by y and auxiliary or temporary variables by a and b and letting [α]represent a Gauss symbol (the maximum integer which does not exceed x), the Zassenhaus method provides an algorithm for calculating an inverse y=x−1 mod 2m as given below assuming that the bit position is given 0 as the least significant bit and that the bit lengths of x, y, a and b are m, where m=2n(n≧1):
Step 1: Input x.
Step 2: Initialize y:=1 and b:=[x/2]
Step 3: Do the following for i=0, 1, . . . , n−1                1. Set a as low-order 2i bits of y×(22i 13 (low-order 2i bits of b)).        2. Pad 2i-th to (2i+1−1)-th bits of y with low-order 2i bits of a.        3. Store 2i-th to (2n−2i−1)-th bits of xa+b in b.        
Step 4: Output y.
This bit processing is such as shown in FIG. 5, in which the contents of the register having stored therein the output y for state changes of i are represented by binary numbers.
“1” indicates bits whose value is always 1, “.” calculated bits, and “?” unknown bits. In the result with i=2 calculated using i=1, fourth to seventh bits are determined. To arrange data in the fourth to seventh bits will hereinafter be referred to as “padding.”
The configuration of the RSA cipher, which is a typical public key cryptosystem, is described, for example, in the above-mentioned literature “Cipher/Zero Knowledge Proof/Number Theory,” p.220. The RSA cipher requires a power calculation over Z/NZ, that is, what is called a modular exponentiation. For fast execution of this modular exponentiation, it is effective to use the Montgomery modular arithmetic algorithm introduced in the above-mentioned literature on pages 179-181. The execution of the Montgomery modular arithmetic algorithm involves an inverse calculation in mod 2m using m as a natural number.
The above-mentioned Zassenhaus scheme involves bitwise processing such as extraction and padding of low-order 2i bits and, in the case of software implementation, the masking operation increases, and hence efficiency is not so high.
The round function part 22 of the round processing stage depicted in FIG. 2 is formed by a combination of the substitution by the S function part 22S1 and the permutation by the P function part 22P.
The substitution-permutation is a concept of a considerably broad meaning. To meet a demand for software implementation in recent years, there has widely been used the substitution-permutation in the following form:                               [                                                                      y                  1                                                                                                      y                  2                                                                                    ⋮                                                                                      y                  m                                                              ]                =                  P          ⁡                      [                                                                                                      s                      1                                        ⁡                                          (                                              x                        1                                            )                                                                                                                                                              s                      2                                        ⁡                                          (                                              x                        2                                            )                                                                                                                    ⋮                                                                                                                        s                      n                                        ⁡                                          (                                              x                        n                                            )                                                                                            ]                                              (        17        )            In this instance, operations are all performed over the ring R. The permutation is given by                     P        =                  [                                                                      p                  11                                                                              p                  12                                                            ⋯                                                              p                                      1                    ⁢                    n                                                                                                                        p                  21                                                                              p                  22                                                            ⋯                                                              p                                      2                    ⁢                    n                                                                                                      ⋮                                            ⋮                                            ⋰                                            ⋮                                                                                      p                  m1                                                                              p                  m2                                                            ⋯                                                              p                  mn                                                              ]                                    (        18        )            and the substitution is set to sj:(R→R=1, 2, . . . , n). That is, the product of the matrix is considered as permutation.
The substitution-permutation expressed by Eq. (17) is also used in the cipher SHARK that is defined in V. Rijmen, et al. “The Cipher SHARK,” ,Fast Software Encryption-Third International Workshop, Lecture Notes in Computer Science 1039, pp. 99-111, Springer-Verlag 1996 (hereinafter referred to simply as Literature S). In Literature S there is also described a method in which the following modified equation is used                                           [                                                                                p                    11                                                                                        p                    12                                                                    ⋯                                                                      p                                          1                      ⁢                      n                                                                                                                                        p                    21                                                                                        p                    22                                                                    ⋯                                                                      p                                          2                      ⁢                      n                                                                                                                    ⋮                                                  ⋮                                                  ⋰                                                  ⋮                                                                                                  p                    m1                                                                                        p                    m2                                                                    ⋯                                                                      p                    mn                                                                        ]                    ⁢                                           [                                           ⁢                                                                                          s                    1                                    ⁡                                      (                                          x                      1                                        )                                                                                                                                            s                    2                                    ⁡                                      (                                          x                      2                                        )                                                                                                      ⋮                                                                                                          s                    n                                    ⁡                                      (                                          x                      n                                        )                                                                                ]                =                              [                                                   ⁢                                                                                                      p                      11                                        ⁢                                                                  s                        1                                            ⁡                                              (                                                  x                          1                                                )                                                                                                                                                                                    p                      21                                        ⁢                                                                  s                        1                                            ⁡                                              (                                                  x                          1                                                )                                                                                                                                          ⋮                                                                                                                        p                      m1                                        ⁢                                                                  s                        1                                            ⁡                                              (                                                  x                          1                                                )                                                                                                                  ]                    +                      [                                                   ⁢                                                                                                      p                      12                                        ⁢                                                                  s                        2                                            ⁡                                              (                                                  x                          2                                                )                                                                                                                                                                                    p                      22                                        ⁢                                                                  s                        2                                            ⁡                                              (                                                  x                          2                                                )                                                                                                                                          ⋮                                                                                                                        p                      m2                                        ⁢                                                                  s                        2                                            ⁡                                              (                                                  x                          2                                                )                                                                                                                  ]                    +          ⋯          +                      [                                                                                                      p                                              1                        ⁢                        n                                                              ⁢                                                                  s                        n                                            ⁡                                              (                                                  x                          n                                                )                                                                                                                                                                                    p                                              2                        ⁢                        n                                                              ⁢                                                                  s                        n                                            ⁡                                              (                                                  x                          n                                                )                                                                                                                                          ⋮                                                                                                                        p                      mn                                        ⁢                                                                  s                        n                                            ⁡                                              (                                                  x                          n                                                )                                                                                                                  ]                                              (        19        )            and the output value of the function SPi expressed by the following equation (20) is precalculated corresponding to every xj and prestored, for example, in a memory to thereby efficiently calculate Eq. (17).                                                         SP              j                        :                          R              →                              R                m                                              ;                                                    SP                j                            ⁡                              (                                  x                  j                                )                                      =                          [                                                                                                                  p                                                  1                          ⁢                          j                                                                    ⁢                                                                        s                          j                                                ⁡                                                  (                                                      x                            j                                                    )                                                                                                                                                                                                        p                                                  2                          ⁢                          j                                                                    ⁢                                                                        s                          j                                                ⁡                                                  (                                                      x                            j                                                    )                                                                                                                                                          ⋮                                                                                                                                      p                        mj                                            ⁢                                                                        s                          j                                                ⁡                                                  (                                                      x                            j                                                    )                                                                                                                                ]                                      ⁢                                  ⁢                  (                                    j              =              1                        ,            2            ,            …            ⁢                                                   ,            n                    )                                    (        20        )            
In the cipher utilizing the substitution-permutation scheme, there is a ase where no permutation is performed at the end of processing but only substitution is used. That is, the following processing is also necessary for cipher implementation.                               [                                                                      y                  1                                                                                                      y                  2                                                                                    ⋮                                                                                      y                  n                                                              ]                =                  [                                                                                          s                    1                                    ⁡                                      (                                          x                      1                                        )                                                                                                                                            s                    2                                    ⁡                                      (                                          x                      2                                        )                                                                                                      ⋮                                                                                                          s                    n                                    ⁡                                      (                                          x                      n                                        )                                                                                ]                                    (        21        )            When the size of the element in R is smaller than the word length that is the operation unit in the computer used, it is necessary in straightforward implementation that the calculation of individual values of sj(xj) be followed by shifting them to their correct vector positions. In this instance, the necessity for the data position adjustment process can be avoided by Modifying Eq. (21) to                               [                                                                                                                                        y                        1                                                                                                                                                y                        2                                                                                                                        ⋮                                                                                                  ⋮                                                                                                                                            y                  n                                                              ]                =                              [                                                                                                                                                                                    s                            1                                                    ⁡                                                      (                                                          x                              1                                                        )                                                                                                                                                              0                                                                                                            ⋮                                                                                                            ⋮                                                                                                                                          0                                                      ]                    +                      [                                                                                                                              0                                                                                                                                                                  s                            2                                                    ⁡                                                      (                                                          x                              2                                                        )                                                                                                                                                              0                                                                                                            ⋮                                                                                                                                          0                                                      ]                    +          ⋯          +                      [                                                                                                                              0                                                                                                            ⋮                                                                                                            ⋮                                                                                                            0                                                                                                                                                                                    s                      n                                        ⁡                                          (                                              x                        n                                            )                                                                                            ]                                              (        22        )            as is the case with Eq. (17) and by precalculating a table in which the positions of vector elements have been adjusted so that 0s would be provided except at the j-th position.
The calculation for the substitution and permutation described in Literature S is disadvantageous in that it involves a large number of memory references and requires a large memory capacity.
As described previously, letting the input data be (u1, u2, . . . , u8) and the output data b (u1′, u2′, . . . , u8′), the P function part 22P in the cipher E2 shown in FIG. 2, for instance, performs an operation using the product expressed by the following equation.                                           [                                                                                u                    1                    ′                                                                                                                    u                    2                    ′                                                                                                                    u                    3                    ′                                                                                                                    u                    4                    ′                                                                                                                    u                    5                    ′                                                                                                                    u                    6                    ′                                                                                                                    u                    7                    ′                                                                                                                    u                    8                    ′                                                                        ]                    =                      P            ⁡                          [                                                                                          u                      1                                                                                                                                  u                      2                                                                                                                                  u                      3                                                                                                                                  u                      4                                                                                                                                  u                      5                                                                                                                                  u                      6                                                                                                                                  u                      7                                                                                                                                  u                      8                                                                                  ]                                      ,                  P          =                      [                                                            0                                                  1                                                  1                                                  1                                                  1                                                  1                                                  1                                                  0                                                                              1                                                  0                                                  1                                                  1                                                  0                                                  1                                                  1                                                  1                                                                              1                                                  1                                                  0                                                  1                                                  1                                                  0                                                  1                                                  1                                                                              1                                                  1                                                  1                                                  0                                                  1                                                  1                                                  0                                                  1                                                                              1                                                  1                                                  0                                                  1                                                  1                                                  1                                                  0                                                  0                                                                              1                                                  1                                                  1                                                  0                                                  0                                                  1                                                  1                                                  0                                                                              0                                                  1                                                  1                                                  1                                                  0                                                  0                                                  1                                                  1                                                                              1                                                  0                                                  1                                                  1                                                  1                                                  0                                                  0                                                  1                                                      ]                                              (        23        )                            where: ui, uj′εRThis operation is expressed merely as the product of a matrix and is the same as Eq. (10). In an environment where a permutation operation using masked data or a permutation operation using a bit shift or cyclic shift is possible by processing every 32 bits, the required number of processing steps is small, and hence fast processing can be achieved. However, since such operation processing cannot be performed in a hardware environment formed by an 8-bit accumulator type CPU, a small number of registers and a small amount of memory capacity as in the case of a low-end smart card, the operation of Eq. (10) needs to be executed successively, and hence high-speed processing is difficult to perform.        
The following description will be given on the assumption that in an implementation environment formed by one 8-bit accumulator, a small number of registers and a small amount of memory, the input data (u1,u2, . . . , u8) is stored in the memory and is read out therefrom, and the permutation output data u1′,u2′, . . . , u8′ is calculated and stored in the memory; the computational complexity in this instance is evaluated. The evaluation is made in terms of the number of times an addition/subtraction is performed, the number of times the memory is written and the number of times the memory is read. And let it be assumed that the permutation operation is performed by calculating Eq. (23) using the above-mentioned matrix P.
Conventional scheme 1: If the permutation operation of Eq. (23) is carried out as defined, then Eq. (10) needs to be calculated successively. The computational complexity in this case is as follows:                Number of additions/subtractions: 36        Number of memory reads: 44        Number of memory writes: 8With this scheme, the number of memory reads is equal to the total number of elements of the matrix P which have a “1” component. Accordingly, the computational complexity increases with an increase in the number of elements whose components are “1s”.        
Conventional scheme 2: In Literature E2 there is described, as the permutation scheme using the matrix P, a scheme that uses GF(28) as the ring R and calculates the following equations.                               a          5                =                              u            5                    +                      u            1                                                        b          1                =                              u            1                    +                      a            7                                                        u          5          ′                =                              a            5                    +                      b            4                                                        u          1          ′                =                              b            1                    +                      u            5            ′                                                            a          6                =                              u            6                    +                      u            2                                                        b          2                =                              u            2                    +                      a            8                                                        u          6          ′                =                              a            6                    +                      b            1                                                        u          2          ′                =                              b            2                    +                      u            6            ′                                                            a          7                =                              u            7                    +                      u            3                                                        b          3                =                              u            3                    +                      a            5                                                        u          7          ′                =                              a            7                    +                      b            2                                                        u          3          ′                =                              b            3                    +                      u            7            ′                                                            a          8                =                              u            8                    +                      u            4                                                        b          4                =                              u            4                    +                      a            6                                                        u          8          ′                =                              a            8                    +                      b            3                                                        u          4          ′                =                              b            4                    +                      u            8            ′                              The computational complexity of this scheme is as follows:                Number of additions/subtractions: 16        Number of memory reads: 32        Number of memory writes: 16        
This scheme is more effective than scheme 1 in the case where the number of registers used is large and addition instructions are orthogonal (the addition instructions can be executed for all the registers). However, this scheme has such disadvantages such listed below.                (a) Since the scheme essential utilizes that the characteristic of R is 2, it cannot be used when the characteristic is not 2.        (b) Since a large number of registers cannot be used, this scheme is not always efficient in some implementation environments.        (c) The computational complexity depends largely on the component configuration of the matrix P.        
Conventional scheme 3: When the number of “1” components is large as in the case of the matrix P, the following calculation scheme can be used.σ=u1+u2+u3+u4+u5+u6+u7+u8 u1′=σ−u1−u8 u2′=σ−u2−u5 u3′=σ−u3−u6 u4′=σ−u4−u7 u5′=σ−u3−u7−u8 u6′=σ−u4−u5−u8 u7′=σ−u1−u5−u6 u8′=σ−u2−u6−u7 The computational complexity of this scheme is as follows:                Number of additions/subtractions: 27        Number of memory reads: 36        Number of memory writes: 9        
This scheme is more efficient than scheme 1 when the number of elements of the matrix P having the “1” component accounts for more than 60% of the total number of elements.
In any of the above conventional schemes, the efficiency of the permutation by the matrix P depends on how {0, 1} components of the matrix P are distributed; in particular, the computational complexity is determined by the rate at which the elements of the “1” component are present. That is, there is a possibility that scheme 1 or 3 becomes efficient according to the configuration of the matrix P; hence, these schemes are not versatile. Which of these schemes becomes more efficient depends on whether the number of elements of the matrix P having the “1” component is more than 60% of the total number of elements.
To implement secure permutation that can be used in cryptography, it is desirable that the rates of “0” and “1” components in the matrix P be well balanced. For example, in the case of the above matrix P used for the permutation in the cipher E2, the elements of the “1” component is about ⅔ of the total number of matrix elements. Since this value is close to the point of determining which scheme becomes more efficient, schemes 1 to 3 are almost common in the number of times the memory is read. This means that these conventional schemes produce substantially no effects of increasing the processing speed, because reading from or writing to the memory is several times lower in speed than addition and subtraction; therefore, the implementation of any of the schemes will not answer the intended purpose of faster permutation operation.