The Advanced Encryption Standard (AES) Algorithm
Electronic communications media allow for transmission and reception of vast amounts of information in the form of digital data. While much of the communications are rather mundane, there are several instances where sensitive or confidential information is communicated through electronic media. In such instances, it is desirable to encrypt the information such that only authorized persons would have access to the sensitive or confidential information. The Department of Commerce recently accepted the Rijndael algorithm, with very minor modifications, as the official Advanced Encryption Standard (AES).
The Rijndael algorithm, which was developed by Joan Daemen and Vincent Rijmen, is described in detail in their proposal “The Rijndael Block Cipher—AES Proposal: Rijndael” (hereinafter referred to as the “Rijndael proposal” or the “AES proposal”) to the National Institute of Standards and Technology (NIST). While not all of the details are provided herein, the following provides the relevant portions of the AES algorithm.
The AES algorithm is an iterative algorithm, meaning that the cipher as a whole involves multiple encryption iterations (or rounds) of certain encryption operations. Each of the rounds produces an encrypted state that is further encrypted in subsequent rounds. In the AES algorithm, the number of rounds is defined by a combination of a block size (i.e., the size of the data block to be encrypted) and the key size (i.e., the size of the encryption key). Each of the rounds, with the exception of the last round, includes four steps, and a “state” is produced at the end of each round. This is shown in FIG. 1.
FIG. 1 is a diagram showing a general overview of the steps involved in the AES algorithm. Prior to actually encrypting any data, the system has knowledge of the block size Nb, the key size Nk, and the number of rounds Nr (which is a function of the block size and the key size such that Nr=f(Nb,Nk)). Since this aspect are described in detail in the AES proposal, and since this aspect is only peripherally related to the invention, it will not be further discussed herein. As shown in FIG. 1, the process may be seen as having Nr−1 normal rounds 115 and one final round 150. The steps in each normal round 115 are a “ByteSub” 120 step, a “ShiftRow” 125 step, a “MixColumn” 130 step, and a “AddRoundKey” 135 step, which are described in greater detail with reference to FIGS. 2 through 5. The MixColumn 130 step is removed in the final round 150, thereby making it a three-step round having the ByteSub 120 step, ShiftRow 125 step, and AddRoundKey 135 step.
FIG. 2 shows the ByteSub 120 step in greater detail. The ByteSub 120 step of the AES algorithm is a non-linear byte substitution, which operates independently on each byte of an input state matrix 210. The substitution table 220 (hereinafter referred to as “S-box” 220) is invertible and is constructed by the composition of two transformations, which are described in detail in the AES proposal. The S-box 220 transformation produces a byte-substituted matrix 230 that is the same size as the input state matrix 210, but with byte values that are defined by the S-box 220. Further details regarding this aspect of the AES algorithm may be found in the AES proposal. However, it is worthwhile to note that, using hardware configurations in the prior art, the S-Box 220 and the raw data (e.g., the input state matrix 210) are moved from a main memory location to a cipher memory location before performing the ByteSub 120 step (or any other data manipulation). The significance of this will become clearer with reference to an example hardware configuration of the prior art, as shown in FIG. 6.
FIG. 3 shows the ShiftRow 125 step in greater detail. In the ShiftRow 125 step, the rows 232, 234, 236, 238 of the byte-substituted matrix 230 are cyclically shifted over different offsets to produce a row-shifted matrix 330 having shifted rows 332, 334, 336, 338. These offsets are described in greater detail in the AES proposal, and, thus, only a truncated discussion is presented here. However, it is worthwhile to mention again that, in prior art systems, the entire data set is moved from the main memory to the cipher memory prior to any data manipulation.
FIG. 4 shows the MixColumn 130 step in greater detail. In the MixColum 130 step, every column 342, 344, 346, 348 of the row-shifted matrix 330 is transformed by multiplying each column 342, 344, 346, 348 of the row-shifted matrix 330 with a specific multiplication polynomial, c(x) 420. This multiplication operation produces a mixed-column matrix 430 having columns 442, 444, 446, 448 that are each a function of their respective row-shifted matrix 330 columns 342, 344, 346, 348. Since this operation is only peripherally related to this invention, and, also, since details related to this operation may also be found in the AES proposal, the MixColumn 130 step will not be further discussed.
FIG. 5 shows the AddRoundKey 135 step in greater detail. In the AddRoundKey 135 step, a bit-wise XOR (logical exclusive OR) operation 510 is performed between the mixed-column matrix 430 and a round key 520 (which has been derived from a cipher key). The derivation of the cipher key and the round key are only peripherally related to this invention, and are described in detail in the AES proposal. Thus, the AddRoundKey 135 step will not be further discussed.
Since the current hardware configurations move the raw data and byte-substitution table from the main memory to the cipher memory prior to performing any data manipulation, inefficiencies may arise due to memory access and byte manipulation. Additionally, since the operation involves shifting bytes across entire words (i.e., byte-shifting across four bytes) as shown in FIG. 3, straightforward implementation of the cipher subsystem is not efficient. These are shown in detail with reference to FIGS. 6A, 6B, and FIG. 13, which show a hardware configuration and a method of the prior art.
Prior Art Hardware Configuration for Executing the AES Algorithm
FIG. 6A is a hardware configuration of the prior art that may be used in the execution of the AES algorithm. As shown in FIG. 6A, data 602 and a key 618 are input to a control unit 601 by way of 32-bit interfaces. The data 602 is stored in a 128-bit input register 604, which, in subsequent rounds of the AES algorithm, is used to store intermediate state cipher text. The data 602 stored in the 128-bit input register 604 is input to the cipher subsystem 600 via 32-bit data buses 613 in a sequential manner. The encryption (or decryption) begins automatically after reception of the fourth 32-bit data (i.e., encryption or decryption is “triggered” upon complete loading of the 128 bits). Once encryption or decryption is complete, the cipher text is written to a 128-bit output register 608 via 32-bit data buses 614 in a sequential manner. The cipher is managed using a control register containing an encryption/decryption flag (not shown), run flag (not shown), and a reset bit (not shown). Data and control registers are accessible using /CS_DATA 622 and /CS_CTRL 624 signals. Read and write are realized at the rising edge of /READ 632 and /WRITE 628 signals, respectively. Key memory is organized in 256 32-bit words, and pre-calculated sub-keys can be entered to the cipher via a separated 32-bit local interface that can be connected, for example, to the local memory. New sub-keys are written to the internal memory at the rising edge of a KEY_STRB 638 signal when /WR_KEY 634 is low. As seen from FIG. 6A, the cipher subsystem 600 may be inefficient because all of the data is moved from main memory to the cipher subsystem 600 memory prior to execution of the AES algorithm steps. This is shown in greater detail in FIG. 6B.
FIG. 6B is a block diagram illustrating a prior art cipher subsystem 600. The cipher subsystem 600 comprises a memory access unit 620 having a host direct memory access (DMA) unit 611 and a cipher DMA 621. The host DMA 611 of the memory access unit 620 is configured to access a main data memory 851 to retrieve plain text in a sequential manner using a 32-bit data bus 613. In the first round of the AES algorithm, the retrieved data cascades through a first multiplexer (MUX) 630, which sends the data to a first cipher data memory bank 650 (hereinafter referred to as memory bank 0 650) or a second cipher data memory bank 660 (hereinafter referred to as memory bank 1 660). The data in bank 0 650 and bank 1 660 is then sent to a second MUX 670 using 32-bit data buses 653, 663. The second MUX 670 selectively transmits the data from one of the cipher data memory banks 650, 660 to a register file 680, which then sends the selected data to an arithmetic logic unit (ALU) 690, which produces a cipher text. The ALU 690 performs the ShiftRow, ByteSub, MixColumm, and AddRoundKey steps in the cipher subsystem 600 of the prior art, thereby producing the cipher text (or state matrices). This cipher text is relayed back to a third MUX 640 and a fourth MUX 676 for the second iteration (or round) of the AES algorithm. The data is then cascaded through third MUX 640, fourth MUX 676, bank 0 650, bank 1 660, second MUX 670, register file 680, and ALU 690, which produces the next state matrix for the third iteration. This system repeats this procedure until the initially input data has been enciphered using the AES algorithm.
The inefficiency associated with this system arises from the fact that the host DMA 611 and the cipher DMA 621 are nothing more than data pass-through devices that retrieve the data for processing. In this respect, data is retrieved from main data memory 851 and completely loaded into memory before any calculation is performed on the data. This, in turn, results in a potential bottleneck at memory access and byte manipulation. This is shown using the flow chart of FIG. 13.
FIG. 13 is a flow chart showing method steps associated with the execution of the AES algorithm as it is performed in the prior art. In step 1320, the cipher subsystem 600 (FIG. 6B) loads the state data (e.g., plain text) into the cipher subsystem memory, and, in step 1330, the cipher subsystem 600 (FIG. 6B) manipulates the loaded state data after it has been completely loaded into cipher subsystem memory. Thus, since the cipher subsystem 600 (FIG. 6B) waits for complete loading of the data into the cipher subsystem memory, a potential bottleneck is created at memory access and byte manipulation.
Accordingly, a heretofore unaddressed need exists in the industry to improve cipher performance.