An encryption engine for performing the American National Standard Institute (ANSI) advanced encryption standard (AES) enciphers and deciphers blocks of data, typically 128 bits (block size) using a variable length key up to 256 bits. Deciphering is accomplished using the same key that was used for encrypting but with the schedule of addressing the key bits altered so that the deciphering is the reverse of the encryption process.
There are a number of different algorithms for implementing AES; one of the more prominent ones is the Rijndael algorithm. Typically, that algorithm receives four, four byte, thirty-two bit words upon which it performs a subbyte transformation which includes a multiplicative inverse in a Galois field GF−1(28) and applying an affine (over GF(2)) transformation. Next a shift rows transformation is effected followed by a mix columns transformation which applies a mix column transformation and adds a round key.
This series of steps is repeated a number of times. The number of iterations depends on the key length and block size in accordance with the Rijndael algorithm. For example, for a key length of four, thirty-two bit words (128 bits) and a block size of four, thirty-two bit words the number of iterations is ten; for a key length of six (192 bits) and block size of four the number of iterations is twelve and for a key length of eight (256 bits) and block size of four the number of iterations is fourteen, where key length is the number of thirty-two bit words in the key and block size is the number of thirty-two bit words to be enciphered at a time. Thus, for example, with a key length of four and block size of four calling for ten iterations or rounds, ten round keys of four, thirty-two bit words each needs to be generated from an input master key of four, thirty-two bit words, one for each iteration or round. These are generated as forty different subkeys through one or two steps depending upon the key length and number of rounds. The first word in the generation of a round key undergoes (a) a word rotation, followed by the subword, a combination of inverse Galois field and affine transformation, and a Rcon[i] (an iteration dependent value) is added over the GF(28) field; (b) a thirty-two bit word permutation exclusive Or-ed with the result of (a). For example, with ten rounds and a key length of four, every fourth subkey generation cycle undergoes both (a) and (b) steps. The other key generation cycles undergo only, (c) a thirty-two bit word permutation exclusive Or-ed with the previous subkey. Thus cycles 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40 employ both (a), (b) steps, the remaining cycles use only (c) step. Typically, this requires 90 or more clock cycles for each word or 360 clock cycles for each block consisting of four words, and 3600 clock cycles for completing a Rijndael algorithm for AES. Thus, for a 10 megabit data stream operating on the four, thirty-two bit word block of one hundred and twenty-eight bits the requirement is for 281 Mega Instructions Per Second (MIPS).
One approach to this problem employs a programmable data encryption engine for performing the cipher function of an advanced encryption standard (AES) algorithm including a first parallel look-up table responsive to a first data block for implementing an AES selection function and executing the multiplicative inverse in GF−1 (28) and applying an affine over GF(2) transformation to obtain the subbyte transformation. A second parallel look-up table transforms a subbyte transformation to obtain a shift row transformation. A Galois field multiplier transforms the shift row transformation to obtain a mix column transformation and adds a round key resulting in an advanced encryption standard cipher function of the first data block as more fully disclosed in U.S. Patent Application entitled PROGRAMMABLE DATA ENCRYPTION ENGINE FOR ADVANCED ENCRYPTION STANDARD ALGORITHM, Ser. No. 10/255,971, filed Sep. 26, 2002, (AD-298J) incorporated herein in its entirety by this reference.
The approach is appealing, however, because the conventional technique for calculation of the AES selection function, S-box values, requiring executing multiplicative inverse in GF−1 (2m) e.g. GF−1 (28) and applying an affine over GF(2) transformation to obtain subbyte transformation is complicated and requires even more processing time. So calculating the values ahead of time and storing them in a look-up table is an advantage. One shortcoming of this approach is that each look-up operation is a serial operation that requires a number of memory cycles to complete which in a deep pipeline machine places a limit on system performance speed.