The U.S. Department of Commerce, National Institute of Standards and Technology (NIST) has adopted a subset of the Rijndael symmetric key block cipher algorithm for its Advanced Encryption Standard (AES), as specified in “Federal Information Processing Standards Publication 197” (FIPS 197), of Nov. 26, 2001. The AES algorithm uses cryptographic keys of 128, 192 and 256 bits to encrypt and decrypt data in blocks of 128 bits. The Rijndael algorithm is also capable of handling 192 and 256 bit blocks and supports extensions to certain intermediate or potentially larger key lengths and block sizes, with operations defined between any of its key lengths and block sizes.
The algorithm iterates a number of nearly identical rounds depending on key length and block size. AES128 uses 10 rounds, AES192 uses 12 rounds and AES256 uses 14 rounds to complete an encryption or decryption operation. More generally, for a key length of Nk 32-bit words and a block size of Nb 32-bit words, the number of rounds, Nr, for the Rijndael algorithm is presently specified as: Nr=max(Nk, Nb)+6.
The invention described herein is applicable to any of the Rijndael key lengths and block sizes, including the 128-bit block size specified for AES, and is also applicable to any mode of operation. The remainder of the patent specification will refer to the preferred AES embodiment, with the understanding that extension to any of the other Rijndael block sizes is also implied.
NIST Special Publication 800-38A, “Recommendation for Block Cipher Modes of Operation: Methods and Techniques”, by Morris Dworkin (December 2001) specifies five confidentiality modes of operation approved by NIST for use in conjunction with any underlying symmetric key block cipher algorithm, such as AES. Other possible modes of operation are also under consideration for NIST approval. The invention described herein is applicable to any of the modes of operation.
In AES, three main steps occur during each round: (a) the text block is modified, (b) the round key is generated, and (c) the modified text block and the round key are added together using an XOR operation to provide the starting text block for the next round. With two exceptions, the text block is modified the same way in each round (S-box substitution, row shifting, column mixing). The first exception is a pre key mix operation (round 0) in which the plaintext message blocks are bitwise XORed with an initial round key filled with the first Nb words from the cipher key itself. (Nb=4 for AES) This pre key mix operation provides the starting text for round 1. The second exception occurs in the final round, in which the column mixing operation is omitted. The details of the S-box substitution, row shifting and column mixing operations for the rounds are described in the aforementioned FIPS 197 document.
The set of round keys (key schedule) is generated from the initial cipher key using a key expansion routine. In AES, the length of the round keys is always the same as the block size (128 bits=4 words) regardless of the length (128, 192 or 256 bits) of the original cipher key. The words of the cipher key are used in the early rounds while they last; then each successive round key word is a function of the preceding round key words. The calculation of the round keys by the key expansion routine is slightly different for each cipher key length, in that, while the same basic steps (S-box substitution, byte rotation, and XOR with a round constant) are used in each case, they occur with different frequencies for the different key lengths.
For a straightforward way of doing decryption, the individual cipher transformations can be inverted and implemented in reverse order from encryption. The form of the key schedules for the encryption and decryption operations remains the same, but are applied in reverse order. Thus, the first round key for decryption is the same as the last round key from the encryption, the second decryption round key is the same as the next-to-last round key from the encryption, etc.
One common approach to key scheduling is to pre-calculate in advance all of the round keys needed for a communication session, and then to save them as a key table in memory to be retrieved as needed for each round. This approach has a large initial latency period while the set of round keys are computed, but has faster subsequent execution of the cryptographic rounds. Moreover, decryption rounds in this case are as fast as the encryption rounds. However, this approach assumes that there is sufficient memory capacity available to store the entire key schedule, and that the initial latency period is tolerable.
Another approach used in some hardware systems involves “on-the-fly” key scheduling, in which round keys are generated as needed on a round-by-round basis. Because this approach does not pre-process the entire key schedule, the initial latency period is avoided, at least in the forward cipher direction (encryption), and memory requirements for the round keys are substantially reduced. This is especially useful for devices that only need to do encryption and which have memory and processing limitations. However, in the reverse cipher direction (decryption), the round keys are needed in reverse. That is, the first round key for decryption is the same as the last round key from the encryption. Moreover, the round keys are functions of the preceding round keys. Existing “on-the-fly” key expansion methods have large latencies in the decryption direction, especially in the early decryption rounds, since for each round the “on-the-fly” key generator must recompute all the preceding round keys until the round key for the current decryption round is reached. If possible, an improved key generation routine is needed for the reverse direction that eliminates this latency.
When ciphers, like AES, are employed in real-world applications, they must first be implemented in hardware or software. An attacker may choose to exploit some weakness of the implementation, rather than trying to find a mathematical weakness in the cipher itself. This may be done through external monitoring of a cryptographic system during its operation to obtain information leaked about the internal operations that could be useful in determining the cipher key. Examples of implementation attacks of cryptographic systems include timing and power analysis attacks that exploit any key-dependent variations in the execution time or power consumption pattern. Known countermeasures to various implementation attacks generally include: tamper resistant chip packaging, physical shielding to block signal emissions, filtering of inputs and outputs, computational techniques to equalize or randomize timing of operations, making the instruction sequence independent of the cipher key or change from one execution to the next, and adding hardware noise to the power consumption pattern. For example, U.S. Pat. No. 6,327,661 to Kocher et al. describe countermeasures which incorporate unpredictable (random or pseudo-random) information into the cryptographic processing. Note, however, that not all of these possible defenses are applicable in every situation. For example, processing and memory constraints of smart cards with built-in cryptographic engines limit which of the many available countermeasures can be used. Additional implementation countermeasures are desired for smart cards and other processor or memory limited applications, particularly during the most vulnerable period when the plaintext is first processed.
Encryption and decryption are necessarily time consuming operations. The many transpositions and substitutions of data bits, bytes and words needed to transform plaintext blocks into ciphertext, and vice versa, require time to process. As block sizes and the number of rounds increases, the problem would tend to get worse, but for the corresponding increase in processing power of the hardware. Any time savings that could be taken advantage of in a given implementation would be advantageous, provided security is not compromised.
An object of the present invention is to provide a on-the-fly key scheduling method and associated hardware or software that can efficiently generate AES/Rijndael round keys in the reverse (decryption) direction.
Another object of the present invention is to provide a hardware implementation of the AES/Rijndael cipher that provides a countermeasure to power analysis attacks during the early stages of encryption, especially during the pre-key-mix stage (round 0) of the cipher.
Yet another object of the invention is to provide an AES/Rijndael implementation that reduces the number of total clock cycles required to process the cipher.