KECCAK is a new secure hashing function that maintains state in an array of bits arranged with a 5×5 edge surface and depth of 2L where L=0 to 6. KECCAK is under consideration by the National Institute of Standards and Technology (NIST) as an algorithm for selection as the SHA-3 standard. The performance of KECCAK in hardware implementations exceeds that of other secure hash algorithms competing for the standard due to the simple logic functions required and the parallelism that can be utilized. The performance of KECCAK for current software implementations is constrained by the high number of logic operations that require individual integer instructions.
A KECCAK state can be viewed as a three dimensional array of elements (bits) with a 5×5 element edge termed a “slice” and a depth (z direction) of w bits where the depth is a power of 2, i.e. w=2l for l=0 to 6, as shown in FIG. 1A. KECCAK uses a “sponge” construction where r bits are input to the KECCAK state with an XOR of the “first” r bits of state, followed by the KECCAK-f state update function. KECCAK-1600 (l=64) is the target function providing the highest capacity for message authentication. Mapping the lanes of the state, i.e., the one-dimensional sub-arrays in the direction of the z axis, onto 64 bit processor words, results in simple and efficient software implementation for the step mappings. For l=6, KECCAK-1600 (5×5×64) is the state update function consisting of nr rounds of five steps/permutations, θ, ρ, π, χ, ι as shown in FIG. 1B.
The θ function is performed as illustrated in FIG. 1C and the ρ function rotates the lanes/registers by specified offsets, requiring 24 register rotates (one offset is zero) as shown in FIG. 1D. The π function is a transposition of the lanes. This transposition changes the usage and grouping of every 64 bit section as shown in FIG. 1E. The transpose has a period of 24 rounds before lanes return to their original position. During the χ step, each row is transformed by neighboring elements of that row:A[x,y]=a[x,y]⊕((NOT a[x+1,y]) AND a[x+2,y])
A row of lanes are calculated together, with 5 NOTs, 5 ANDs, and 5 XORs needed for a total of 15×5=75 operations. The ι add round constant is applied to a single register/lane requiring 1 XOR instruction as shown in FIG. 1F. The operations per round are 55 for θ, 24 for ρ, zero for π, 75 for χ, 1 for ι. There are a total of 155 operations per round. The number of rounds nr is 12+2l=24 for l=6, i.e. 64 bit registers. For 24 rounds KECCAK requires 155*24=3720 operations. On a four execution unit processor if each operation requires an instruction, a minimum of 930 cycles are required.
The above operations cannot be performed efficiently using parallel execution of the functions using vector instructions, such as streaming single instruction multiple data (SIMD) extensions (SSE) or advanced vector extensions (AVX) instructions from Intel® Corporation of Santa Clara, Calif., due to the π function since the location of the lanes and corresponding words of the cube are scrambled each round.